Mastodon

Driving code design, through tests (IV)

In the context of TDD, tests that only exercise the unit level are some of the most restrictive tests you can write, especially if you are maintaining a 1-to-1 relationship between implementation code and test code.

Driving code design, through tests (IV)

This post is part of a series on TDD:

Welcome to the final article in my first four-part series!

So far we've talked about a lot of concepts, from the perennial Red, Green, Refactor to the more esoteric Transformation Priority Premise. And in all that chat we haven't touched on one of the more crucial aspects of test-driving the design of your code:

At what level should you write the tests?

Levels of testitude

When I first started writing automated tests for my code (i.e. before I was using tests to drive the design) I didn't really understand the "why" behind what I was doing. I would write one or more test cases per class method by rote, exercising the method with a small amount of variation in the inputs, and call it done. Private methods caused me a fair bit of confusion: do I make them public so I can test them? The answer turned out to be "no". Never do that.

After getting a firmer grasp on testing as a concept, I started writing class-level tests. I got more comfortable about not testing private methods directly, and my tests focused more on behavior than implementation. But I was still writing one Test Class for each Implementation Class. I wrote a blog post about this test-class-to-implementation-class thing a while back, and why this approach is less than ideal from a design perspective.

Looking back, after years of automated testing at myriad levels of abstraction, I've come to the following:

In the context of TDD, tests that only exercise the unit level are some of the most restrictive tests you can write from a design perspective, especially if you are maintaining a 1-to-1 relationship between implementation code and test code.

Remember, the purpose of the tests is to set up a scaffolding to enable you to play around with the design of your code in a safe way: refactoring and reforming the code into a habitable shape while maintaining behavior. If your tests are written at the unit level, you're not going to have much wiggle room because it's way more difficult to move around outside the boundaries of the class. Instead of the tests driving the design, your tests are imposing the design.

To jump to the other end of the spectrum, writing your tests from a systemic level (E2E or end-to-end tests) provides a ton of wiggle room. You're free to massage any part of the implementation, from the nuts-and-bolts to the overall architecture to the user interface. However, there are a few issues that need to be weighed:

  1. System level tests tend to be expensive to write (and run)
  2. System level testing frameworks tend to suuuuuuuuuuuck

If you think I'm exaggerating on point 2 ("there's no way there should be that many 'u's...") then find your local testing guru and ask her about Selenium (or TestComplete, or Watir, or WebDriver, or FitNesse, or ...) and how likely a given test is to fail for no reason (hint: some tests are just "flakey" and may fail randomly, causing second-guessing and general lack of faith in the other tests). You may want to wear protective padding. In my experience I have never once used a system-level testing framework and thought "my, that was a pleasant experience"; mostly my post-system-test thoughts involve high-proof bourbon and how I can acquire it with the utmost rapidity.

So if we don't want to focus on the unit-level, and the system-level can be expensive to write, where should we focus?

Functional units

I tend to focus on Functional Units for testing, or "a single area of code that describes the largest possible unit that encompasses a single user flow or idea while still being small enough to lean on unit-level testing tools".

Which is a long way of saying "a vertical slice". In the context of, say, a web application with a REST API, this definition could mean "if I inject a known AJAX POST into an HTTP endpoint to perform some user operation, exercise all of my code that gets called as a result". If the code needs to reach out to an external service (like a database, a REST API, an LDAP server, etc) I will mock out those services.

By focusing on behavior at this level I can be 100% free to design the entire stack, from entry point to external dependency. And the tests are handled as large unit tests, so there's no mocking of code that I own and the tests are quick to write and quicker to execute.

Some other examples of functional units:

  • In a React application, the largest Component (and all subsequent child components) that can represent a discrete user activity, while mocking out any server responses using Sinon's Fake XHR and Server functionality.
  • For a Java Spring web service, using MockMvc to execute a Controller through the IoC mechanism, and mocking out the database or external REST calls that happen as a result.
  • In Scala, use the word "monad" and hope everyone just kind of goes with it. This applies for way more than using tests to drive design.

The big takeaway here is that you want to find the competing forces that restrict your ability to design your code at different levels of testing (e.g. "units are too small to be useful" and "tools at the largest levels of testing tend to be expensive and/or unreliable") and find a balancing point that offers you optimal flexibility for the specific functional unit you need to design.

If that sounds complicated, don't worry: it totally is. And in a lot of cases the tooling to enable these ideas simply isn't there. I tried desperately to come up with an example of a testable functional unit for iOS development, but was unable to find one at the level I wanted. "Lack of tooling at a broader testing level" is certainly a restriction that you will have to consider in your search for optimal flexibility in many languages/frameworks. But, to be fair, if you're working in iOS the larger architectural pieces (Storyboard, ViewController, ViewModel, Model) are already more or less set in stone, so any design-driving tests should be written at a different altitude.

Wrapping up

This has been a fun and cathartic series for me. I hope that the ideas presented across these articles made sense and will stand the test of time (i.e. are valid for at least the next six months).

Quickly going over some of the things covered:

  1. You're not testing the code, you're driving its design
  2. Your TDD mantra: Red, Green, Refactor
  3. Don't DRY up your tests – keep them legible. Future You will thank you
  4. Keep your Time In Red as short as possible
  5. Follow the Transformation Priority Premise to help write the next test
  6. Test at the optimal level for the thing you're designing (and the context in which you're designing it)

And remember, TDD is a practice, and continuing to exercise will make you better at it. When I first started out I could manage only one of the ideas above (specifically "Red, Green, Refactor") and only through years of effort have I managed to add a second one. I assume after another few decades I'll be able to TDD while using most of those ideas simultaneously.

If you feel like there are other facets I could cover, please let me know. And, as always: with TDD you're not testing your code, you're driving its design.