Posts Unit tests should be large, not small
Post
Cancel

Unit tests should be large, not small

If a unit test does not catch a regression, especially after refactoring, it is pointless. Small unit tests typically don’t catch regressions. If the code base will not live for very long (eg proof-of-concept) then manually test.

When refactoring, or adding new features, the structure of the project typically changes in non-trivial ways. Small unit tests typically break. If a unit test breaks then it serves little purpose, as now you need to test again (re-writing the test counts as testing again). The idea behind unit tests is to make code changes without breaking existing code. The tests should not require changing.

What does large mean?

A large unit test does not mean the test code itself is large, but rather that the coverage of the test is large. The test should cover functions/classes working together from one external dependency to another. External dependencies define the upper bound on how large a unit test should be. Unit tests should not have any external dependencies.

External dependencies include APIs, databases, OS/Phone APIs etc. A library that does a data transform, is not an external dependency (since we could write the code ourselves). eg JSON conversion. These libraries should be considered internal.

A large unit test should cover a pure function

By making our unit tests cover a large surface of our code base, we need to make our code base reliable and predictable. Pure functions are predictable. Given the same data, they produce the same output. For more on this here is why we should use pure functions not interfaces. Modelling our software from external dependency to another means that we build pure functions. This pushes nasty things like state/mutation to the edge of the system, where it is easier to control.

Let’s make this concrete, with an example we can walk through. Here is a pure function that needs testing parseDomainType assuming we are building a notes app. Sample code will be in F# (because it is awesome), though the concepts can be applied to any language be that C# or Java.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
// A DTO object that has correct equality
type RestApiResponse = {
    StatusCode:int
    Reponse:string
    Headers: Map<string,string>
}

// An enum that can hold data on each case (Called a discriminated union)
type RestApiError = 
| ServerError of response:string
| ClientError of response:string
| MalformedResponse of error:exn

// Our target type - yep we're building a notes apps :) 
// A DTO object that has correct equality
type Note = {
    Created: DateTime
    Message: string
}

// This is only showing the types. Input is a RestApiResponse, while the output is a type
// the could be success some type 'a, or an error of type RestApiError
parseDomainType: RestApiResponse -> Result<'a, RestApiError>

// usage 
let response: RestApiResponse = // some web call here
let notes = parseDomainType<List<Note>> response
// show notes on mobile app UI

Given this is a pure function, it is not particularly important how this function works. I will post links below GitHub with a full implementation. To find the inputs and outputs of the API, all that was required was to think about the edges of the system. One edge was the API to download the notes. The other edge was the UI. These are both external dependencies. We simply model the correct types for them. Considering an API, it is clear that it might fail, so the response may not have valid data. This highlights that our pure function ‘parseDomainType` must return a result that captures the notion of failure.

Include internal dependencies

An internal dependency is simply a function/method/class that does a code transformation. It could be in your code base, or it could be a library. As long as the code does not do anything with the outside world, the unit test should cover it. Said another way, internal dependencies should be pure functions not interfaces](/2019/02/22/interfaces-vs-pure-functions/). Pure functions can be glued together (that is why they are some awesome), so our main function under test parseDomainType, is built out of small pure functions. Some of those functions are business logic. and others are libraries.

The fact that parseDomainType may use a library is simply an implementation detail. Swapping it out should not change the behaviour. The ability to swap out pure functions (and have the types still line up) has a name - referential transparency. It’s what gives us the confidence to change our software and know (simply by compiling our code and that we’re using pure functions) that we haven’t broken anything.

Our unit tests confirm that our code meets some acceptance criteria. ie just because our code compiles and doesn’t throw exceptions, does not guarantee that it does the right thing according to the acceptance criteria.

Small tests might help during development, not after

When writing the domain logic for my app, I prefer to use a script file and a REPL. Many languages have these including F#, Python, C#, and Java 9+ (though some are easier to use than others). This makes it really easy to build code, and check it works. It’ a substitute for small tests.

If your language does not have these tools, small tests can be used to prove out details of how single function works. Once the small test passes, write a large test, and delete the small test. Commit only the tests that are large.

Sometime later when the code needs to be changed, the small test will be a distraction. A false positive. Delete it now to improve productivity. A few small tests (over some very important logic) are fine, but too many small unit tests become noise and hide the large useful unit tests.

Small tests are harder to name

Naming a large unit test (remember large means the amount of code coverage, not the count of lines in the test) are easier to name as the path through the code base is more high level. More concrete. Small tests, however, focus more on the details and are less low level. They are testing a tiny piece that now requires a precise name, for a code piece that is not clear what it does when considered outside of the context of the large function eg parseDomainType.

A worked example

It’s time for an example to make this all a little more concrete. As already stated we have our pure function that converts an HTTP response into a list of notes for our notes app.

1: 
let parseDomainType<'a> (response: RestApiResponse): Result<'a, RestApiError> =

Our first test might be the happy path:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
open NUnit.Framework
open System

[<TestFixture>]
type NoteTests() = 
    open FsUnit

    [<Test>]
    member this.``Given a response parseDomainType can parse a list of notes`` () = 

        // Create the input data for our test
        let response = {
            NotesService.StatusCode = 200
            NotesService.Body = 
                """[{
                        "Created": "2019-02-02",
                        "Message": "First note"
                    }] """
            NotesService.Headers = Map.empty
        }

        // act - run the pure function with the pure data
        let notes = NotesService.parseDomainType<NotesService.Note list> response

        // assert on the result
        match notes with 
        | Ok notes -> 
            let expected = { NotesService.Created = DateTime(2019, 02, 02, 00, 00, 00, DateTimeKind.Utc); NotesService.Message = "First note" }
            notes |> List.head |> should equal expected
        | Error e -> failwithf "Expected Ok notes but got: %A" e

In the code above NUnit is used. A class is defined to hold the tests. A library, FsUnit is also used for assertions. A method is defined, that takes advantage of F#’s naming feature so the name of the method can contain spaces. The test then creates the input data and calls the function under test with that data. Finally, the test pattern matches on the result (since the function defines that it could fail). If the response is an error the tests fail (by throwing an appropriate message). If the parse succeeds, the test then checks the objects are correct which is easy with F#’s record types as they implement equality on all fields for us.

The above test is not very long but tests quite a lot of our application. First shows the dev reading the test what the raw expected JSON should be. This is very helpful, as it clearly defines how any parsing logic should work. It’s also useful if the backend is not using the exact same tech stack, and they need to see exactly what the output should be. Finally, it makes it easy to upgrade or change the parsing library, as this will check that the behaviour has not changed. The response object requires all fields to be defined so this makes it clear when to expect success or failure.

What remains now is to add a set of tests varying the input with the expected output. These include changing the status-code, the response body and the headers. When all of these tests are added, there is quite a bit of repetition in the tests. A small amount of code clean that up, so the above tests could look as follows:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
[<Test>]
member this.``Given a response parseDomainType can parse a list of notes`` () = 
    successfulResponseWithBody """[{"Created":"2019-02-02","Message":"First note"}]"""
    |> NotesService.parseDomainType<NotesService.Note list>
    |> expectOk (fun result -> 
    
        let expected = { NotesService.Created = DateTime(2019, 02, 02); NotesService.Message = "First note" }
        result |> List.head |> should equal expected
        )

For such a small amount of code, there is quite a lot being tested here. This test is very clear about what it is testing too.

Bugs won’t be in the business logic

As can be seen in the example above, it is very easy to create unit tests and assert the code is correct. The code that causes bugs/exceptions (external dependencies) will now be pushed to the edge of the system, for an MVVM solution that could be the view model. The search space is now much smaller when problems do arise, as we know that our business logic and associated libraries are well tested, so the search should begin at the edge; the view models.

Refactoring

This is one of the biggest advantages of creating large tests over pure functions. The internals of the library can be changed without breaking the tests. There are not mocks, so they can’t break. The test is not aware of any internal libraries that may be changed or upgraded (they are an implementation detail).

Finally, because these tests are easy to create, there should be a good leave of code coverage - refactor fearlessly. With confidence in our tests, we can make constant or large changes to the internals of the function knowing that nothing will be broken. This means the code base will remain clean over time and as we all know, clean code is much nicer than bad code.

Taking Action

If you are not familiar with pure functions then learn about those now:

Pure functions

pure functions not interfaces Read up on the full implementation for this post here:

parseDomainType as a pure function

testing parseDomainType with large tests Practice in a small demo app, using a domain that you understand well. These concepts can be hard at first to understand. By starting from a clean slate with a known problem, it is possible to make progress. Happy [type safe] coding

CodingWithSam

This post is licensed under CC BY 4.0 by the author.