Using docker for integration tests

Davi de Castro Reis
9 min readJul 28, 2024

--

A robust testing pipeline is surprisingly hard to achieve. One crucial property of tests is that they need to be reproducible. At odds with that property is another important one: tests have dependencies. In fact, most bugs are found in the interaction of the systems, which is why integration tests are paramount.

An insightful advice from Vercel's CEO — easier said than done though

The problem with integration tests is that they are hard to set up. With unit tests, you mock everything but the small piece of code you are testing, and then you are in stateless nirvana, where start-up time is fast, and everything is reproducible. But with integration tests, you need to prepare the surrounding state, bring up a database, spin other services your test depends upon, write results to disk, and clean up everything after you are done. You have state, side effects, and a non-neglectable amount of non-determinism, as requests flow in parallel with no predictable order, networks fail, and the disk gets full. It is a launch-and-pray situation.

You want to increase the reproducibility of your integration tests. For that, you need isolation. Luckily, the industry has settled on containers as the de facto isolation layer for running computations. They have more shared state than virtual machines or distinct bare metal servers, but they are fast to boot and reasonably secure. Nowadays, if you want to launch a server in any environment, you write a Dockerfile and build a container image, which will run in its own filesystem with its own network adapters. And if containers are good for production, they should also be suitable for setting up the environment for your integration tests.

And lots of people write their integration tests that way. They write a compose.yaml file where they describe the dependencies for the test, bring it up with some scripts, run the integration test, and shut down the environment in the end. It works well for one test or even for many tests that can share the testing environment, but it stops scaling once the need for specific setups for different integration tests arises. Enter testcontainers, a nice project that offers an embeddable library in several programming languages that allow you to describe your dependencies inside your testing code, leveraging dependency injection frameworks, a tight integration with testing frameworks and IDEs, and many other goodies. The project also provides several tuned containers for common dependencies, like Redis or Postgres.

But this scenario still needs to be improved. You are running dependencies as containers, but your code is being built locally and runs locally. That means that you need to make a lot of effort to achieve hermetic builds and ensure any side effects from the integration tests do not linger once the tests are complete. Building hermeticity is a challenging problem, and our industry has split into two directions to solve this problem. Either you use out-of-tree build systems, like Bazel and its doppelgangers, or you rely on CI/CD pipelines which spin fresh VMs for your builds. They both have downsides: Bazel is an engineering marvel but has a very steep learning curve and integrates poorly with IDEs and open-source projects in general. On the other hand, systems like GitHub actions or Circleci are expensive and complicated to simulate locally. If you could build your code with the idiomatic tools of your language, like cargo or gradle, but inside a container, and run it inside a container as you do with the dependencies, you could solve both these problems. So, let us see how we can do it.

Building the code for your integration tests in Docker is easy. It is similar to how people compile the code for releases, although they are usually focused on image size. Here, we want to focus on the development cycle, such as in GitHub devcontainers. Some base image, slightly featureful, that you can use for interactive, incremental development. Running the tests, however, gets more complicated because you need Docker to spin up the dependencies, and Docker needs some help running inside Docker itself. There are some popular patterns to work around this, like docker-in-docker (aka DinD) or docker-outside-docker (aka DoD), or even more native but harder-to-set-up solutions like Sysbox. They all come with trade-offs regarding start-up time, setup pain, and capabilities.

What none of these solutions let you do is leverage the “docker build” system as a trail of record for your tests. You can only access Docker in the “docker run” steps. And then you lose all the incremental, remote caching magic from Docker, a capability that bazel also developed to the limit. The almost paranoid stance of Bazel and, to a lesser extent, Docker regarding hermeticity enables large-scale, remote caching, a contrast to regular build systems, which have trouble even supporting local incremental builds.

Incremental builds are important even in tiny codebases since they are part of the programmer's inner loop. In large codebases, they can kill productivity.

The docker-outside-docker trick is to mount the docker socket as a file inside the container. But that is not a regular file, it is a disguised HTTP server exposed as a unix socket. Because we cannot mount external volumes during the build step and only copy regular files, we cannot inject the socket into the environment where the build steps are executed. We can do docker-inside-docker, and for example, Earthly uses that technique with syntax sugar on top, but moving caching from the external Docker into the inside Docker becomes very expensive. And although Earthly is a project full of benefits, much of the open-source ecosystem only integrates well with Dockerfiles (I am looking at you Skaffold).

There is, however, something else that we can inject into a docker build environment besides files. We can inject a network. If we expose our unix socket as a port in the local system instead, we can access that in the Docker build steps. Here is a Dockerfile that uses this technique. We are using the Wolfi Linux base image, which is an excellent choice as it has the benefits of Alpine without its problems.

FROM cgr.dev/chainguard/wolfi-base
ENV DOCKER_HOST=host.docker.internal:2375
RUN apk add docker-cli
RUN docker ps

If you try to build this image, it will fail in the last command. When the command docker ps runs, it will try to reach an unsafe HTTP server on host.docker.internal:2375, and unless you have manually configured your docker desktop to serve that, the connection will be refused. Also, note the usage of the host.docker.internal domain, which is created by Docker for you. Let us use the network-swiss-army-knife socat to expose the unix socket over the network and get the right incantation for docker buildto guarantee the build step can communicate with the host on all platforms.

socat -v TCP-LISTEN:2375,fork UNIX-CONNECT:/var/run/docker.sock&
docker build --add-host=host.docker.internal:host-gateway .

And now the build should succeed. However, suppose your build steps contain an actual invocation of an integration test that uses testcontainers. In that case, you must modify the Dockerfile contents to use socat in the inverse direction, exposing the network socket as a unix socket. Also, testcontainers uses ryuk to make sure the containers it spawns are collected after you are done with them, and you will need to inform testcontainers about where docker is running by setting the TESTCONTAINERS_HOST_OVERRIDE variable.

FROM cgr.dev/chainguard/wolfi-base
ENV TESTCONTAINERS_HOST_OVERRIDE=host.docker.internal
RUN apk add docker-cli
RUN socat -v UNIX-LISTEN:/var/run/docker.sock,fork TCP:host.docker.internal:2375 &./gradlew integrationTest

Now you have integration tests that pass as part of a Docker build. This is super useful as you can use it in your CI/CD and use Docker remote caching capabilities, like GHA, to give you something closer to the large-scale Bazel experience without abandoning the toolset you are familiar with.

This setup may look like the holy grail of Docker inception, but there is more. Although you are no longer leaking state into the host filesystem and nicely capturing your test reports in Docker’s overlayfs, you are still leaking port 2375 to the host, breaking some of the isolation. We can improve this by running the initial socat call inside Docker itself. This makes the solution more portable across Windows and Mac as well.

Running docker-inside-docker-inside-docker can get as confusing as a Nolan movie.

For that, we will create a container that is the devcontainer from where we will kick the build of the image which runs the integration tests. For the devcontainer, we will use pkgx.sh as the primary package manager, since it can generate small lazy images through its caching mechanisms. The integrate image will set up the build environment, and use socat to expose the docker socket to the RUN line where we call python testing library unittest, which will pick up our integration test (despite its name).

FROM cgr.dev/chainguard/wolfi-base AS devcontainer
RUN apk add curl libgcc docker-cli docker-compose
RUN install -D /usr/bin/docker-compose /usr/local/libexec/docker/cli-plugins/docker-compose
RUN curl -Ssf https://pkgx.sh | sh
RUN pkgx install socat python
RUN mkdir -p /var/run/
RUN python -m pip install testcontainers[postgres] psycopg2-binary
WORKDIR /usr/src/app
COPY Dockerfile compose.yaml test.py ./

FROM devcontainer AS integrate
ENV DOCKER_HOST=host.docker.internal:2375
ENV TESTCONTAINERS_HOST_OVERRIDE=gateway.docker.internal
RUN socat UNIX-LISTEN:/var/run/docker.sock,fork TCP:host.docker.internal:2375 & \
python -m unittest discover

And here is the toy Python integration test we will run in our proof of concept.

import unittest
import psycopg2
from testcontainers.postgres import PostgresContainer

class MyIntegrationTest(unittest.TestCase):
def test_postgres_version(self):
pin = "sha256:36ed71227ae36305d26382657c0b96cbaf298427b3f1eaeb10d77a6dea3eec41"
with PostgresContainer("postgres:16-alpine@" + pin, driver=None) as postgres:
cursor = psycopg2.connect(postgres.get_connection_url()).cursor()
cursor.execute('SELECT version()')
version = cursor.fetchone()
self.assertEqual(version[0][:13], "PostgreSQL 16")

Finally, we will use compose to launch the devcontainer and invoke the proper build command to build the integrate image from within devcontainer. Notice this time we need to distinguish the host.docker.internal from gateway.docker.internal, since we are exposing the docker socket in a IP that is dynamically decided by docker when it brings up the devcontainer.

services:
integrate:
build:
target: devcontainer
network_mode: host
volumes:
- //var/run/docker.sock:/var/run/docker.sock
command: |
sh -c '\
socat TCP-LISTEN:2375,fork UNIX-CONNECT:/var/run/docker.sock& \
DOCKER_HOST_IP=$(hostname -i) docker compose build do-integrate'
do-integrate:
build:
target: integrate
network: host
extra_hosts:
- gateway.docker.internal:host-gateway
- host.docker.internal:${DOCKER_HOST_IP:-host-gateway}

And voilá. You now have made your integration tests into a hermetic, portable, incremental, and remote cacheable computation that you can trivially use in CI/CD setups, local debugging sessions, or whatever you prefer. By moving everything inside the docker build step, we bypassed a big conundrum of build time cache mounts and run time volume mounts in Docker. It is worth saying Podman has build time mounts, but they lack the GC capabilities of the build cache mounts. We can also use the build time cache mounts to save on image size and hence remote cache space, at the cost of extra downloads. This is a sweet deal for github actions, including the native GHA docker remote caching, or even Circleci with the upcoming docker cache s3 support. But if you want the best of both worlds, you can do remote builds in a platform that supports affinity like depot or earthly satellites.

There is still a small gotcha: the devcontainer stage will be built twice, once as a standalone image and a second time as the base stage for integrate. Caching fails because the DOCKER_HOST_IP is a dynamic value injected in the /etc/hosts file, busting the cache. That can be fixed with yet another hack, by injecting the value through a build time secrets mount, but it will be left as an exercise for the reader (and if you are on mac, notice extended attributes will also break cache, and that xattr does not work in modern terminals).

To wrap it up, here is one last iteration in the Dockerfile, pinning all external dependencies for improved reproducibility and using cache mounts. Distributing computations is always subject to trade-offs involving the cost of hardware, speed of networks, and multiple other factors. As the rule of thumb du jour, aim to minimize uploads, followed by CPU usage, and freely use cheap local storage followed by downloads. Using pkgx.sh as the package manager is quite clever as you end up with small images that save a lot of space on your docker cache and pay nothing on a cache hit, and you download dependencies on a cache miss.

FROM cgr.dev/chainguard/wolfi-base:latest@sha256:3eff851ab805966c768d2a8107545a96218426cee1e5cc805865505edbe6ce92 AS devcontainer
RUN apk add curl libgcc docker-cli docker-compose
RUN install -D /usr/bin/docker-compose /usr/local/libexec/docker/cli-plugins/docker-compose
RUN mkdir -p /var/run/ && mkdir -p /usr/local/bin/
RUN curl -L https://github.com/pkgxdev/pkgx/releases/download/v1.1.6/pkgx-1.1.6+$(uname)+$(uname -m).tar.xz | tar xvJ -C /usr/local/bin/ -f-
RUN --mount=type=cache,target=/root/.cache,id=root-dot-cache --mount=type=cache,target=/root/.pkgx,id=root-dot-pkgx \
pkgx install socat@1.8.0.0 python@3.11.9
RUN --mount=type=cache,target=/root/.cache,id=root-dot-cache --mount=type=cache,target=/root/.pkgx,id=root-dot-pkgx \
python -m pip install testcontainers[postgres]==4.7.2 psycopg2-binary==2.9.9
WORKDIR /usr/src/app
COPY Dockerfile compose.yaml test.py ./

FROM devcontainer AS integrate
ENV DOCKER_HOST=host.docker.internal:2375
ENV TESTCONTAINERS_HOST_OVERRIDE=gateway.docker.internal
RUN --mount=type=cache,target=/root/.cache,id=root-dot-cache --mount=type=cache,target=/root/.pkgx,id=root-dot-pkgx \
socat UNIX-LISTEN:/var/run/docker.sock,fork TCP:host.docker.internal:2375 & \
python -m unittest discover

In retrospect, this was incredibly hard to get right due to all the tiny details one must pay attention to. And you can’t easily test cache hits other than inspecting logging output or timing runs. But it works in the end.

Hitting Postgres for your integration tests with half a second rerun is nice.

--

--

No responses yet