ad-hockery: /ad·hok'@r·ee/, n.
Gratuitous assumptions... which lead to the appearance of semi-intelligent behavior but are in fact entirely arbitrary. Jargon File

Organizing functional tests

I posted a few days ago about functional testing & some of my frustrations, focusing mainly on the technical issues. I did touch on test organization in terms of modelling behaviour rather than page structure. As Luke Daley has pointed out the two aren’t fundamentally in opposition and I’ve been giving this some further thought.

I think the crux of the issue is that functional tests, end-to-end tests, acceptance tests, whatever you want to call them are not the same thing as unit tests. They’re not really even just a higher level of the same thing. Yet on the projects I’ve worked on we’ve generally carried on as though they were.

Unit tests are associated almost on a one-to-one basis with units of code. When you execute grails create-controller Foo the framework generates a skeleton controller and a unit test for it. You certainly shouldn’t be slavishly beholden to the one unit test class per production class habit but it’s generally reasonable. Each unit test describes the behaviour of a single unit of the software. I don’t think the same association is appropriate for functional tests but we often end up using it anyway.

I’ve tried to distil some rules of thumb from all this.

1: Organize tests around features not units of code

When it comes to writing functional tests in Java or Groovy the tendency is to follow the pattern of associating tests to some kind of unit of software. Typically it’s a test per web page or significant page component (e.g. ArticlePageSpec, NewsTickerSpec). When dealing with the content management side of apps rather than the public facing it often defaults to a test per high level domain object.

When work is started on new functionality the temptation is often to augment existing tests rather than creating new ones. Oh, this card deals with article pages on the site so let’s go find the ArticlePageSpec and add our coverage there.

The functional tests are supposed to be, among other things, the living, executable documentation for your product. So there should be a strong association between the tests and the features that have been implemented over the life of the project. In the teams I’ve worked on features have been tracked using index cards on a board but the same could apply to issue tracking systems. When using Spock there’s even a dedicated annotation for tying a feature method or specification class to an issue:

import spock.lang.*

class HttpsProxySpec extends Specification {

2: Name tests according to features

I think part of the reason the habit of associating tests with units of code is hard to break is naming. Test classes have to be called something and that something is limited by Java’s class naming requirements and conventions. Naming things is often treated as an afterthought, but it’s actually hard. When I start working on a card with a description something like "A user can save articles to a list of favourites" what am I going to call the functional test class? I should probably go for UserSavesArticleToFavouritesSpec but if this is the first card played that’s to do with favourites maybe I’ll just be lazy and call it FavouritesSpec. Next week someone else plays a card "Users can remove articles from their list of favourites" and adds their coverage to my original FavouritesSpec which is now some hybrid beast testing two different bits of behaviour.

3: Put tests somewhere you can find them later

Over time something I’ve noticed is that it becomes difficult to identify where exactly a certain bit of functionality is covered in the functional tests. Team members fixing defects will sometimes try to figure out who implemented something in the first place and ask them where the tests are or even end up trawling through the Git log. This shouldn’t be necessary.

A corollary to that is that it also becomes difficult to identify redundancy or obsolescence in the functional tests. If a feature of the application is removed or replaced then obviously the tests around it should also be removed. The wrong way to do that is by running everything and deleting whatever fails.

I’ve also seen test cruft floating around (still passing somehow) from long-dead features. In some cases so long-dead that we’ve had to go rooting through the code because we can’t remember whether the functionality still exists.

Redundancy is a huge problem on some projects. One in particular had a vast suite of Selenese tests that often repeatedly exercised bits of functionality often nothing to do with what the test was concerned with actually testing. For example the CMS used the Grails URL convention of /type/action/id (e.g. /article/edit/1234). Because such URLs are unpredictable tests would do things like invoke a data fixture then use the CMS’s search tool to find the thing the fixture created and navigate to its edit page.

4: Keep tests atomic

There’s a best practice that states each test should make a single logical assertion and I’d extend that to say that test classes should test a single feature (including variations and edge cases). If the new test you’re writing is nothing to do with the other tests in the class or is only related to them because it deals with some of the same parts of the application then it should be broken out into a new class.

Sometimes this comes at the cost of repeated initialization or data setup and developers' natural DRY instincts resist that. But in the long run well organized tests will pay dividends over pristine DRY-ness especially as the requirements of individual tests have a habit of mutating over time so that what is common now may not stay that way.

Selenese tests seem particularly prone to run-on testing where what are in reality multiple tests are combined into one because it is easy for the resulting state of an earlier part of the test to feed into further assertions. For example a test that creates an article then makes sure the user can search for it, then uses the search result to navigate to an edit page where it makes sure the article can be updated and finally deleted. A TDD anti-pattern known as "The One". Spock’s @Stepwise annotation can be similarly abused.

5: Isolate data fixtures for each test

Starting out with simple tests that create some data we’ll often spot some commonality & move some of the data setup into setup or setupSpec methods (@Before or @BeforeClass in JUnit). When successive cards are played and developers augment that test they’ll find the original data is insufficient or inappropriate. Before long there’s a test with some data created in setup, individual tests adding more, some tests ignoring the data from setup completely and worse, some tests actually modifying or even deleting bits of it.

This is not a recipe for a maintainable test. The problems compound themselves. It may not be obvious that one of the tests methods doesn’t actually need the data provided by the setup method so it gets left there as the test evolves. Eventually it can reach a point where the test has changed so much that virtually none of the individual tests actually need the data provided by the setup method or all of them make significant changes to it but it’s still sitting there anyway because that fact isn’t apparent without a very close reading of the code.

I’m not saying don’t write tests that share data created in setup. But do it sparingly and don’t then add tests that don’t need that data; put them somewhere else.

I do prefer tests that set up their own data rather than relying on a common fixture. I think it makes the test read better: the fixture becomes part of the test’s preconditions (the given in the BDD given, when, then structure). That’s not to say that tests can’t share fixtures. Factory methods, fixture loaders and so on are appropriate ways of doing that.

Grails has a couple of plugins that are very useful here; Fixtures and Build Test Data. I’m actually pretty bad at using the Fixtures plugin effectively but am convinced it’s a good idea. Build Test Data is usually among the first plugins I install in any project and it’s very useful for hiding irrelevant fixture detail from the test code.

TL;DR version

I firmly believe that small is good when it comes to functional tests. I want lots of tests each covering some well defined facet of the application’s behaviour, preferably traceable in some way to the original card. I think if I’ve understood anything of Behaviour Driven Development it’s that tests should be organized around product features and not units of code.

Still TL;DR version

I should probably start using Cucumber.

Web Statistics