Effective Tests

A bunch of good things may come to our minds when we think about automated tests, such as bug detection, but in fact, the reason to have tests is the same reason to have good design; reducing costs by having changeable code.

Changeability requires three correlated capabilities:

  • Refactoring
  • Flexible design
  • Effective tests

But how do automated tests enable changeability? Particularly, I can see only two main reasons that turn into many others

  • Confidence to change
  • Documentation

Confidence to change

Without tests, every change is a possible bug (Martin, 2008).

No matter if the infrastructure is well convinced if the domain complexity is not handled in the design (Eric, 2003), turning our lens over tests, no matter how flexible the architecture is, how nicely partitioned the design, the reluctance of change will always be there without tests (Martin, 2008). A fearful environment stifles the ability of refactoring.

Refactoring

Refactoring consists of improving the existing code by restructuring its internal design aiming to remove tech debts, improve readability, and accommodate new features… but maintaining the current behavior.

Guarding the expected behavior with tests gives the possibility to find and fix bugs earlier in the process, that's where the refactoring confidence comes from. More the coverage, more the confidence, more the confidence, less the stress. It's safe to change constantly improving code design that's surrounded by tests; in fact, refactoring is a crucial practice for having a flexible code.

Flexible design

A good design preserves maximum flexibility at minimum cost by putting off decisions at every opportunity (Metz, 2018)

In addition to being strongly linked to refactoring, good design also depends on testing. When the design is bad, testing is hard.

  • A painful setup indicates that the code requires too much context, meaning the object under test needs to know too much about its dependencies.

  • The need for many objects suggests excessive dependencies. The more dependencies there are, the less stable the code becomes and the greater the change amplification.

  • If it's hard to write tests, then the code probably it's also hard to reuse

These "indicators" help to track how flexible, maintainable and reusable is the code. A good design it provided by refactoring and test practices, and at the same time, the good design corroborates with refactoring and makes tests easier to create. They feed from each other.

Flexible code, effective tests and refactoring linked on each other


The benefits of testing go beyond the codebase. Without a safety net of tests, a new requirement can cause apprehension, pressure, or even extra working hours. In addition, tests act as living documentation for the system (one of its main goals), contributing to lowering the learning curve, and reducing cognitive load.

Documentation

Cognitive load (a symptom of complexity) relates to how much a developer needs to know in order to complete a task. When the cognitive load is higher, higher is the time required to learn what needs to be changed and greater the risk of bugs due to a missing point (Ousterhout, 2021).

An elevated cognitive load comes from the obscurity caused by inadequate documentation, which stems from bad code that's not able to express its intention. Static documents, such as comments become a crutch when we fail to express ourselves in the code.

Static Documents

One of the more common motivations for writing comments is bad code (Martin, 2008)

In analogy to websites, comments and doc files are a form of static document, which means that they remain the same even if the behavior changes. They are awesome to clarify some piece of code and they are also the best ones for propagating misinformation.

That doesn't mean that we should never write static documents. They are crucial to guide outsiders on how to use an SDK, providing use cases and code examples. For other purposes though, prioritize dynamic documents.

Dynamic Documents

In contrast to comments and doc files, the code itself can and should be used to dynamically document the system design and behavior, adapting to every requirement. But that doesn't come for free, as shown in the How to test section, the code must be expressive, simple, and structured.

Keep in mind that tests are also code, first-class citizens in the codebase. They are crucial to the system documentation, offering a practical perspective on how the production code should be used; highlighting design decisions, assumptions, and constraints. The story they tell remains true long after paper documents become obsolete and human memory fails (Metz, 2018).

Focusing solely on static documents can overshadow the importance of clean and well-tested design. More obvious is the design, the greater the collaboration within the team (developers and stakeholders), the less the need for comments and documents.


Automated tests go hand in hand with agile development and are a key part of continuous delivery. Entire books could be written about their benefits and the problems they prevent. But, they must be well-written with a clear purpose regarding what and how to test. Without intentionality and clear strategies, testing can be a burden, not a benefit.

Test Strategy

Automated tests can leverage different strategies that offer valuable insights into the cost-benefit trade-off. Discussing test strategies is almost impossible without mentioning the test pyramid. The concept introduced by Mike Cohn in the Succeeding With Agile book, helps to visualize the ideal structure for an automated test suite. It emphasizes a strong foundation of unit tests at the bottom, gradually transitioning to fewer but broader integrated tests at the top.

Test Pyramid

The Pyramid remains a valuable tool, but it falls a little short if we take a closer look. It can be overly simplistic, misleading in naming and in some conceptual aspects; e.g. Service Test is something specific to the Service-oriented Architecture and UI Tests do not necessarily mean slow. But the main concept remains.

  • Write tests with different granularity
  • The higher level you get, the fewer tests you should have

Beyond these core concepts, consider changing the test layer naming using terms that align with the codebase. Here's an option of a consistent naming convention based on the integration level that matches the industry standards; Unit Test, Integration Tests (Also called component test), and End to End Test.

Unfortunately, the terminology used to describe test strategies can differ significantly between authors. Don't be attached to the test naming conventions though, concentrate on the core concepts and agree upon common terms to be used in the team.

End to End

End to end tests

Starting from the top of the Pyramid we have the end-to-end(E2E) tests. As the name suggests, the idea here is to test some part of the application simulating real-world scenarios, fully integrated, involving connections with database, filesystem, network calls, etc. The user interface is often involved in E2E tests, but a physical client isn't required to create or execute them.

End-to-end tests sit at the top of the testing pyramid due to their inherent trade-offs. This kind of test requires an extensive setup and configuration, which makes it more time-consuming to develop and maintain. Additionally, E2E tests are susceptible to flakiness, meaning they can sometimes fail due to external factors or environmental changes, leading to unreliable results.

Despite their drawbacks, E2E tests provide invaluable confidence asserting that a complete user journey functions as expected after a change. However, it's important to use them strategically and sparingly. Smoke tests, a type of End-to-End test, are a great choice here. They focus on testing just the core functionalities of the system with a small set of test scenarios, which minimizes the downsides of this test strategy.

Integration Tests

integration_test

Also known as Component Tests, Integration tests offer flexibility to the test scope when compared with E2E. They count on Test Doubles to replace some external components, such as database and network calls in order to isolate and verify interactions between application components avoiding real-world dependencies. This controlled environment gives the possibility to focus on a specific integration at a time, promoting efficient and target testing.

Integration tests can be effectively done with a narrower scope. All the network calls can be replaced by mocked responses and Contract Tests can be created to verify that the protocol with the server remains intact instead. In a layered architecture, for example, it's possible to swap out any layer component with a test double to achieve different levels of integration.

layered architecture

By focusing on a smaller portion of the system, integration tests run quicker and reduce the configuration cost, making them easier to create and maintain. This enables a large amount of them in the test suite while maintaining its healthiness.

Unit Tests

unit tests

Unit tests form the cornerstone of any effective test suite. They focus on testing small pieces of code at a time, isolated from external actors. Although the term Unit in the context of Unit Test is not stone carved, it can range from a function to an entire class. The isolation in Unit Tests can also vary depending on the sociability level, being them solitary or sociable.

  • Sociable - While integrating with real collaborators, unit tests can gain more realistic behavior. But, this approach comes with trade-offs. First, setting up the test becomes more complex as all dependencies of the object under test also need to be instantiated. Second, these tests are susceptible to side effects. Any changes within the collaborators can cause multiple tests to fail, even if the object under test itself functions correctly.
  • Solitary - To maximize the isolation, unit tests can rely exclusively on test doubles to simulate dependencies, eliminating the issues associated with sociable tests but losing the realistic behavior.

The sociability level is not a big deal when defining the test strategy, take the chance to learn what works best for the team using the codebase to experiment with different approaches. A mixed attitude between sociable and solitary is also valid, where just complex collaborators are replaced by test doubles, or only data objects are real collaborators.


An intentional test strategy is crucial for a robust test suite. Center on writing a focused set of End-to-End scenarios, complemented by a substantial set of Integration Tests, and a huge amount of Unit Tests as the test pyramid foundation. This balanced approach avoids the "ice cream cone" anti-pattern, where too many E2E tests slow down development and reduce test suite maintainability.

Now that we've explored test strategies and their trade-offs, let's delve into what to test and how to create them effectively.

Effective Test

It's tough to cover the nuances of each test strategy in a book, and It's impossible to do that in an article. The following sections explore some best practices that can be applied to all kinds of tests, but emphasize a solid foundation on Unit Test for the test suite from an Object-oriented Programming perspective.

What to test?

You don’t send messages because you have objects, you have objects because you send messages. (Metz, 2018)

Traditionally, class-based design focuses on defining objects and their functionalities. By putting messages at the center of the design, the application revolves around the communication between objects rather than focusing on the objects themselves. This can be understood as a shift from "what" an object is to "what" it needs (the message).

Imagine saying, "I need this to be done" instead of dictating how to do it. This "blind trust" fosters collaboration between objects. Messages act as requests, allowing objects to fulfill their responsibilities without tight coupling or knowledge of each other's internal workings. This promotes loose coupling and modularity in the overall system design.

Answering the initial question: What to test? Messages are what we need to test.

Message passing through an object

Objects deal with two main kinds of messages in a conversation, the Incoming Messages and the Outgoing ones. Testing these messages ensures both sides of the conversation are working as expected.

Incoming messages

Incoming messages define the public interface of the receiving object, establishing its communication protocol. The receiving object is responsible for testing its own interface, which is done through tests of state, asserting expected results upon incoming messages.

Object message exchange

Outgoing messages

Outgoing messages are the ones that an object sends to another. They are naturally the incoming messages for other objects. In a conversation that goes from the object A to the object B, the outgoing message from A becomes the incoming message to B.

There are two kinds of outgoing messages:

  • Queries: These messages retrieve information without causing lasting changes. Since only the sender cares about the response, queries typically don't require testing.
  • Commands: These messages trigger actions within the system, potentially affecting other components. Commands are crucial for system functionality and should be thoroughly tested.

It's important to note that the sending object should not assert on the receiving object's public interface. Instead, it should focus on ensuring the command is correct: sent with the right data, at the appropriate frequency. These tests are focused on verifying the message's behavior, not the internal workings of the receiver.

The coffee maker

Imagine owning a fully automated coffee maker. You don't need to worry about heating water, grinding beans, or any of the intricate details. All you care about is enjoying your perfect cup. With a simple selection, you hand over the entire coffee-making process to the machine, trusting it to deliver the desired result.

The coffee maker receives a recipe as the incoming message, specifying water temperature, grind size, and other specifications.

Coffee maker grinder sequence diagram

To fulfill the recipe, it coordinates with components like the grinder, sending some outgoing messages to them. However, the initial design coupled the coffee maker too tightly with the grinder's operations, requiring it to interact with the silo directly.

Coffee maker grinder enhanced sequence diagram

To enhance modularity and reduce dependencies, the coffee maker should delegate the bean acquisition process to the grinder. By providing the grinder with a desired powder profile, the coffee maker allows the grinder to work autonomously by just sending a simple query message, abstracting the complexity of bean dispensing.

--

Coffee maker cache sequence diagram

The coffee maker can also cache the last selected recipe. When a coffee drink is chosen, the coffee maker stores the selected recipe by sending a command message to the cache, making it possible to quickly select the same coffee for the next brew.


A message-centric approach offers significant advantages. By focusing on messages, systems become more flexible due to looser coupling between objects, making maintenance and expansion easier. This perspective also aids in discovering new objects, as a message inherently requires a corresponding object to handle it, leading to more modular and reusable designs.

It's important to note that messages an object sends to itself are never directly tested, as they are private methods and not part of the public communication interface. If you believe some internal messages should be tested, ensure they are being sent to the correct place. This may indicate the need to create a new object to handle these messages, allowing for proper testing.

How to test?

As we discussed previously, If not well-written, tests can increase costs instead of saving the company's money. They require thought, design, and care, and readability is everything when it comes to clean tests. But how can we make tests readable? In the same way that we create readable code, with expressiveness, simplicity, and structure.

Expressiveness Use meaningful names for variables, methods, and classes, avoiding writing code that is overly clever or obscure. Concise code is often desirable, but it should never come at the cost of clarity.

// Throws an exception if the boiler is not ready for the recipe
fun prepare(recipe: Recipe){
    if(boiler.temperature != recipe.temperature) throw Exception()
      //continue with the process
    }
}

Initially, a comment vaguely indicated that an exception would be thrown if the boiler wasn't ready. However, the code lacked clarity as the exception type was generic and the boiler's readiness check was implicit.

fun prepare(recipe: Recipe){
    val boilerIsNotReady = boiler.temperature != recipe.temperature
    if(boilerIsNotReady) throw BoilerNotReadyException()
    //continue with the process
}

By introducing a variable to explicitly check the boiler's readiness and specifying a custom exception, we've transformed a vague comment into clear, actionable code. This approach enhances code readability and maintainability while providing valuable information about the system's behavior.

Simplicity Code should be as simple as possible while still fulfilling its requirements. Break down complex logic into smaller and manageable units.

fun listComponentsForMaintenance() : List<Component>{
    val componentsForMaintenance = emptyList()
    components.forEach{ component ->
        val now = Clock.System.now()
        val days = components.lastMaintenance.daysUntil(now, TimeZone.UTC)
        val eligibleForMaintenanceList = emptyList<String>()
        if(days >= 30) {
            componentsForMaintenance.add(component)
        }  
    }
    return componentForMaintenance
}

In this example, a component is added to the componentsForMaintenance list if the last maintenance was 30 days ago or more. However, the code is difficult to read due to its chained logic within a for-each loop. There's no storytelling.

fun listComponentsForMaintenance() : List<Component> {
    return components.filter{ component->
        isEligibleForMaintenance(compnent)
    }
}

private fun isEligibleForMaintenance(component: Component): Boolean {
    val now = Clock.System.now()
    val lastMaintenance = component.lastMaintenance
    val daysSinceLastMaintenance = lastMaintenance.daysUntil(now, TimeZone.UTC)
    return  daysSinceLastMaintenance >= 30
}

Instead of having a long chain of logic, identify smaller steps and create separate functions for each. This makes the code easier to understand. The days since last maintenance logic could also be extracted to its own method. In addition, the isEligibleForMaintenance method could be a Kotlin's extension function of the Component class. Even better, the Component itself could determine if it is eligible for maintenance.

Structure Separate concerns and responsibilities. This makes the code easier to navigate and understand.

fun prepare(recipe: Recipe) : Coffee {
    // other steps here
    val beans = grinder.silo.dispense(recipe.weight)
    grinder.grind(beans, recipe.granulometry)
}

The coffee maker directly interacts with the silo to obtain beans, unnecessarily coupling the two components.

fun prepare(recipe: Recipe) : Coffee {
    // other steps here
    grinder.grind(powderProfile)
}

By transferring the responsibility of bean acquisition to the grinder, we enhance encapsulation and improve code structure, avoiding the Demeters law violation.


These small examples shown above seem to cause no harm to the codebase, but remember, complexity is sedimentary, it accumulates in small chunks.

The coffee maker test

Before we start, note that the Coffee Maker's dependencies are not treated as abstractions here. While Robert Martin advocates for robust abstractions and recommends using the Impl suffix when there's just a single implementation for an interface, I prefer Sandi Metz’s perspective. She suggests that code can initially focus on functionality and evolve towards abstractions as needed. Modern IDEs make it easy to extract an object's public interface when necessary.

Remember, the language (Kotlin in this case) is just a tool. We can translate these ideas to any object-oriented programming language. MockK is also used as the mocking framework, but consider using stubs when testing against an abstract dependency. Unlike mocks, stubs focus on specific behaviors of the dependency while keeping the rest of its functionality intact. Check out this article about test doubles to learn more.

Let's "brew" some tests...

class CoffeeMakerTest {
    @MockK private lateinit var grinder: Grinder
    @MockK private lateinit var cache: Cache
    private lateinit var coffeeMaker: CoffeeMaker

    @Before
    fun setUp() {
        MockKAnnotations.init(this)
        coffeeMaker = CoffeeMaker(grinder, cache)
    }

    @Test
    fun `should return coffee when call the prepare function`() {
        val espressoRecipe = EspressoRecipe()
        every {
        grinder.grind(PowderProfile(weight = 15f, granulometry= 1.5)) 
        } returns Powder("espresso")
        val coffee = coffeeMaker.prepare(espressoRecipe)
        assertThat(coffee).isInstanceOf(Espresso::class.java)
        assertThat(coffee.ratio).isEqualTo("1:2")
        verify(exactly = 1) { cache.store(espressoRecipe) }
        //other assertions and verifications
    }
}

In this example, we ensure that a coffee is returned when an espresso recipe is prepared. We also verify that the selected recipe is stored in the cache. However, the test is only marginally readable because the example is not very complex, and we are testing just one small interaction of the coffee maker without any error cases. The code lacks expressiveness, simplicity, and structure. At the very least, it uses real-world examples for the test.

Naming

Let's start with the test naming: should return coffee when call the prepare function. This name doesn't convey anything about business rules; it focuses more on function calls, also, every piece of code "should" do something or be deleted instead.

When creating a test, try to answer the question: What behavior is expected from the code under test given a specific action? For example, brew an espresso when an espresso recipe is prepared.

Given, When, Then

"Given, when, then", "Build, Operate, Check" or "Arrange, Act, Assert" are structure patterns valid for every kind of test and consist of splitting the test function into three sections:

  • Given: This section sets the initial state, including data and preconditions for the test.
  • When: This part represents the action that triggers the behavior being tested, usually a function call.
  • Then: This section checks if the operations yielded the expected results.
class CoffeeMakerTest {
    // setup

    @Test
    fun `brew an espresso when an espresso recipe is prepared`() {
        val espressoRecipe = EspressoRecipe()
        every {
        grinder.grind(PowderProfile(weight = 15f, granulometry= 1.5)) 
        } returns Powder("espresso")

        val coffee = coffeeMaker.prepare(espressoRecipe)

        assertThat(coffee).isInstanceOf(Espresso::class.java)
        assertThat(coffee.ratio).isEqualTo("1:2")
        verify(exactly = 1) { cache.store(espressoRecipe) }
        //other assertions and verifications
    }
}

One assertion per test

Some testers suggest that each test case should include only one assertion. This is less about limiting the number of assert method calls and more about focusing on verifying a single behavior per test case.

In the current example, the test method is checking both the incoming mesasge to the coffee machine and outgoing messages sent to the cache. These represent two distinct scenarios and should be tested separately.

Instead, one test case should be created to to assert the state of the coffee maker when a prepare(recipe) message is sent. Another test case should be dedicated to ensuring that the cache is called to store the recipe.

class CoffeeMakerTest {
    // test setup

    @Test
    fun `brew an espresso when an espresso recipe is prepared`() {
        stubEspressoGrinding()

        val coffee = coffeeMaker.prepare(EspressoRecipe())

        assertThat(coffee).isInstanceOf(Espresso::class.java)
        assertThat(coffee.ratio).isEqualTo("1:2")
    }

    @Test
    fun `store espresso as the last chosen recipe on prepare an espresso`(){
        stubEspressoGrinding()
        val espressoRecipe = EspressoRecipe()

        val coffee = coffeeMaker.prepare(espressoRecipe)

        verify(exactly = 1) { cache.store(espressoRecipe) }
    }

    private fun stubEspressoGrinding(){
        val powderProfile = espressoPowderFixture()
        every { grinder.grind(powderProfile) } returns Powder("espresso")
    }
}

Domain-Specific language

Domain-specific languages (DSLs) are a great approach to enhancing test writing. Instead of using production and test APIs directly, a set of functions is created to abstract them, increasing readability and maintainability. DSLs are also especially useful for managing complex data sets within test scenarios, enabling dynamic test data manipulation to meet intricate requirements and variations.

For example, in the previous test, we can improve the expressiveness of the cache command verification by abstracting the verify block into verifyRecipeHasBeenCached(espressoRecipe), just as we did with the espresso grinding stub abstraction (stubEspressoGrinding).

Test fixtures are also useful for creating simple test cases with well-defined data. In addition, they can be used alongside domain-specific languages, serving as building blocks for them.

Keep it clean

Having dirty tests is equivalent to, if not worse than, having no tests (Martin, 2008)

Achieving clean and maintainable tests is a continuous effort. There's no single magic solution, but rather a combination of techniques. Clear naming conventions, proper test structure, and selecting the right test data management approach (fixtures or DSLs) depending on the scenario are all key factors.

Sometimes, improving the production code itself can significantly benefit test quality. Refactoring code for better separation of concerns or creating helper methods can lead to more expressive and maintainable tests.

There are situations where controlled code duplication actually improves test readability and maintainability, even if it violates the DRY (Don't Repeat Yourself) principle. Finding a balance between these two aspects is essential..

Unmaintained tests lose their value and can create a false sense of security. Regularly review your tests. If a test becomes difficult to maintain, consider refactoring it, improving the code it targets, or even removing it entirely if it no longer serves a purpose.

Avoid using multiple architectures and test patterns. While a specific solution may seem optimal for a single problem, it can increase cognitive load when considered in a broader context. "Perfect" is a personal thing, seek improvement instead.

TDD vs BDD

As mentioned, it's almost impossible to cover everything related to testing in a single article, but these two practices deserve special mention when discussing clean tests.

  • TDD (Test-Driven Development) relies on writing automated tests before writing any production code. This practice ensures code quality by forcing developers to think about the desired functionality and potential issues upfront.

  • BDD (Behavior Driven Development) is a development approach that focuses on describing the desired behavior of a software system. It promotes collaboration between developers, testers, and non-technical stakeholders by using a natural language (like Gherkyn Syntax) to write test cases. This shared understanding ensures the final product aligns with business requirements.

Both Behavior Driven Development (BDD) and Test-Driven Development (TDD) are not mandatory practices, but they significantly enhance the software quality assurance process.

F.I.R.S.T

Last but not least, create tests F.I.R.S.T

  • Fast: Tests should execute quickly to provide rapid feedback and avoid slowing down development cycles.
  • Independent: Tests shouldn't rely on the outcome of other tests, allowing them to be run in any order or isolation.
  • Repeatable: Tests should produce the same results consistently regardless of the environment or previous test runs.
  • Self-Validating: Tests should clearly indicate success or failure without requiring manual interpretation.
  • Timely: Write tests before or alongside the code they are testing. This promotes Test-Driven Development (TDD) where tests guide the development process.

Conclusion

Test automation offers a several amount of benefits and are a key for continuous delivery, with several practices impacting IT performance. Reliable automated tests are essential: when tests pass, teams can trust their software is ready for release, and test failures accurately reflect defects. Flaky or unreliable test suites can lead to false positives or negatives, so it's important to invest in making tests dependable.

Developers should primarily create and maintain acceptance tests, as they can reproduce and fix these tests on their own workstations. Tests created and maintained by QA or outsourced parties do not correlate with better IT performance. Involving developers in the creation and maintenance of tests improves testability and aligns with test-driven development (TDD), which encourages more testable designs (Forsgren, Humble, & Kim, 2018).

Ultimately, embracing a culture of testing is about more than just checking boxes or reaching a certain code coverage percentage. It's about fostering a mindset of quality throughout the entire development process. It's about building systems that are not only functional but also flexible, maintainable, and resilient to change. By adopting a thoughtful and intentional approach to testing, we can create software that truly stands the test of time.

References