Architectural regression testing

Unit tests help ensure that the system does what it should, but how do we ensure that the system has the correct architecture?

Feb 03, 2024

I’ve been in love with functional programming for a while, and among other things, I love a type system that makes invalid states unrepresentable. One cannot necessarily ensure the system does what it should, but various tools can help us.

A great type of system can ensure that certain bugs won’t compile. Unit tests can minimize the risk of regression, and ensure that the system behaves as expected. Certain formatting tools are great; ensuring for instance tabs vs spaces, amount of newlines, line length, or whatever… Linters can ensure that certain code smells are not broken (at least not blindly). And putting all that in a pipeline can ensure that only acceptable code is integrated.

But what about the structure of the code?

Protecting architecture

Systems are built in a variety of ways, and no silver bullet exists. Certain principles minimize coupling, improve cohesion, and ensure that the system is easy to extend or change. Nevertheless, each system will have different principles and conventions. One might follow the MVVM pattern, MVC pattern, Hexagonal architecture, Onion Architecture, or maybe even actively enforce a big ball of mud. One might also have other principles - like splitting data classes and behavioral classes.

No matter which principles you have, they can be difficult to enforce, and one slippery pull request and the system is slowly regressing.

Ensuring architecture via automated tests

A prior colleague (and overall great person) introduced me to utilizing automated testing to assert certain idioms about the code. This was mindblowing to me. It’s, as with everything, a balance - I tend to overdo anything when I initially learn a new practice. Done right, such tests can be a huge help.

One principle I like is a pure domain model. If you want to enforce a pure domain, it makes sense to limit which packages the domain module can import - i.e. critical frameworks, and possibly utility functions, but not database integrations. This could be implemented as follows in Python (skipping the actual implementation of get_imports):

import re
import pydantic
from myproject import domain, utils

def test_domain_is_pure():
    whitelist = [utils, pydantic, re]
    assert all(
        imported_module in whitelist 
        for imported_module in get_imports(domain)
    )

The Autofixture.Idioms package for dotnet can ensure that a certain class adheres to certain rules. I.e have guard clauses that protect against null values:

[Test]
public void Constructor_is_guarded_against_nulls()
{
    // ARRANGE
    var fixture = new Fixture();
    var assertion = fixture.Create<GuardClauseAssertion>();

    // ACT & ASSERT
    assertion.Verify(typeof(TestClass).GetConstructors());
}

One might want to ensure that equality works as expected:

[Test]
public void Equality_is_correctly_implemented()
{
    // ARRANGE
    var fixture = new Fixture();
    var assertion = fixture.Create<EqualityAssertion>();

    // ACT & ASSERT
    assertion.Verify(typeof(SampleValueObject));
}

These are great for ensuring certain practices regarding individual classes. Generally speaking, this is a certain class of property testing.

Utilizing reflection & introspection

Reflection, however, can make some of the above cases quite strong.

By utilizing System.Reflection one could ensure that all Data Transfer Objects (DTOs) adhere to certain rules:

[TestCaseSource(nameof(AllDtos))]
public void Equality_is_correctly_implemented()
{
   ...
}

public static object[] AllDtos =>
   // Depending on codebase
   return Assembly.GetExecutingAssembly().GetTypes().Where(IsDto);

A clear benefit with TestCases (as above), is the fact that a separate test run will be created and shown for each class, making it quite transparent exactly which classes fail, and which don’t.

In a project we utilized Dagster to represent our data assets. In Dagster, each asset has an Asset Key, (a list of strings, basically representing a unique identity/address). We have certain conventions regarding these keys, which not only ensures that a correlation exists between the key and the location of the underlying data, but it also enables us to have certain utility functions, that is based on these asset keys. But with all conventions, they can easily slip past a review, and suddenly you have regression.. unless you have a test like this:

@pytest.mark.parametrize(
    "asset_key",
    list(
        dataplatform.asset_graph.all_asset_keys
    ),
)
def test_asset_keys_follow_conventions(
    asset_key: AssetKey,
):
    broken_rules = [
        rule.name
        for rule in rules
        if not rule.followed_by(
            asset_key
        )
    ]

    assert (
        not broken_rules
    ), f"asset key '{asset_key.to_user_string()}' breaks {broken_rules}"

Utilizing pytests parameterize, similar to C#’s TestCaseSource, we create individual tests. This enables test explorers to easily highlight broken cases. One could potentially split the above even further, creating a separate test for each rule; the above was just easier to compose in this example.

Architecture in the larger scope

The examples above are focused on a specific application and the architecture within that. Any architecture between systems is not discussed. However, what we are doing is merely validating code - here both in C# and Python - luckily any code-driven system can essentially be tested in similar practices. This is one of many reasons infrastructure-as-code is great. Keif Morris argues how to write tests in his book about infra-as-code. When your code is declarative, and has no logic, all you really can test are certain properties of the system. Pulumi exposes this via Policy As Code, and each system will have its own set of tools. The principles are, however, the same. Most call them property testing.

This can be constructed in a variety of ways; however, enforcing policies and constraints via code minimizes both human errors and the cost of peer review.

Placing it on the code level, however, enables you to catch issues before they are deployed, which is great. You want to be notified if any resources in your infrastructure are not properly protected, however, even better would be to ensure they cannot be deployed in the first place.

Takeaways

Creating tests like this probably seems quite excessive if you aren’t used to writing tests in general. Nevertheless, if you want to enforce certain conventions and principles, replacing the human gatekeeper with an automated one is great. I’ve seen many teams have certain principles and practices, but they are easily forgotten or skipped when the heat is on. I, personally, love code that is predictable and “clean”, but forget which and how, both when writing code and when reviewing. I love automating as much as possible, so I can focus on the fun part of software engineering; solving interesting problems, without being bogged down by things that a machine can do better than I. To err is human - but a CI pipeline is luckily not human.

Reflective Software Engineering