A Database Cloning Strategy for Testing with Alembic

The other day I came across this blog post by Philippe Gaultier. He shows how he ended up speeding up a suit of test cases by x2 (based on his situation). In short, the implementation goes by creating a golden database, that acts as a source of truth, and then it gets cloned on each test that requires it. I suggest reading that post first.

Usually, there are multiple methods to test the latests db changes, the most common are in-memory dbs (with Sqlite3) or rolling back transactions on each test (PostgreSQL). These strategies lacks debugging tools and applies migrations test by test. So applying this strategy seems reasonable.

On this post, I’ll show an Alembic implementation for managing migrations on each test by using this golden database approach. I used SQLAlchemy 2.x and Sqllite3. Why not PostgreSQL? Well, this database has a client-server architecture so this strategy won’t work, because we need the .db files on disk.

I created this repository with the code snippet so you can check out everything about the approach and how to run it. The code includes a basic structure for a book entity model, repository, service and schema.

First let’s take a look at the db/config_test.py file. The class GoldenDb is our testing db driver and has the corresponding methods to create a new db with acquire_test_database_path(), which returns it’s path. For this to happen, we collect all the current migrations in order to generate a unique hash that will correspond to the name of the golden database. And when cloning on each test, they get stored at the /tmp directory, so when we restart our computer, it gets cleaned up.

def __init__(self, auto_rm: bool = False) -> None:
    self.alembic_config = Config(str(get_root_path() / "alembic.ini"))
    self.alembic_script_dir = ScriptDirectory.from_config(self.alembic_config)
    self.migration_revisions = self._collect_migrations()
    self.migrations_hash = self._compute_migrations_hash()
    self.auto_rm = auto_rm

Looking at the GoldenDb constructor, the first step is configuring Alembic using the project’s alembic.ini file. This configuration allows the class to retrieve the latest migrations through _collect_migrations() and generate a unique database identifier with _compute_migrations_hash(). Since the hash is derived from the migration state, it changes whenever a new Alembic migration is added, ensuring that each schema version produces a distinct database.

The constructor also accepts auto_rm=True, which automatically removes temporary database files from /tmp after each test run, helping keep the test environment clean and preventing stale databases from accumulating over time.

def create_golden_database(self) -> Path:
    print("Creating new golden database...")
    random_name = hashlib.sha256(os.urandom(16)).hexdigest()
    temp_path = self.TMP_DIR / f"{random_name}.db"

    self.alembic_config.set_main_option("sqlalchemy.url", f"sqlite:///{temp_path}")

    print("Setting up migrations...")
    try:
        command.upgrade(self.alembic_config, "head")
    except Exception:
        temp_path.unlink(missing_ok=True)
        raise

    with sqlite3.connect(temp_path) as test_db:
        test_db.execute("SELECT 1")

    print("Renaming database...")
    golden_path = temp_path.rename(self.TMP_DIR / f"{self.migrations_hash}.db")

    print("Golden database created!!")
    with sqlite3.connect(golden_path) as golden_db_conn:
        golden_db_conn.execute("DELETE FROM book")
        golden_db_conn.commit()

    return golden_path

In more detail, create_golden_database() initializes a new golden database and applies all migrations using command.upgrade(). The target database is configured beforehand through alembic_config.set_main_option(), which is why the constructor first locates the alembic.ini file. After the migrations complete, the function verifies that a connection can be established, renames the database file using the generated hash, and performs a final validation to ensure the expected tables exist before returning the golden database path.

It is worth noting that this process runs only once. Subsequent calls to acquire_test_data_path() first check whether the golden database already exists and reuse it if available, avoiding the cost of recreating and migrating the database for every test run.

def acquire_test_database_path(self) -> Path:
    golden_path = self.TMP_DIR / f"{self.migrations_hash}.db"

    if not golden_path.exists():
        self.create_golden_database()

    clone_path = self._copy_db(golden_path)
    self.alembic_config.set_main_option("sqlalchemy.url", f"sqlite:///{clone_path}")

    return clone_path

When golden_path exists, then we just create a copy of the db with _copy_db() which creates a copy of the db file with shutil.copyfile(). We don’t use copy2() since that method also copies date metadata and we can get confused on the output timestamp of the copies at /tmp.

Finally we create a contextmanager to get a db session with the cloned path on the future tests.

@contextlib.contextmanager
def open_test_session():
    golden_db = GoldenDb(auto_rm=True)
    database_path = golden_db.acquire_test_database_path()

    engine = create_engine(f"sqlite:///{database_path}")
    SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
    db = SessionLocal()

    try:
        yield db
    except Exception as e:
        db.rollback()
        raise e
    finally:
        db.close()
        engine.dispose() # important for testing; cleans the connection pool

        try:
            if golden_db.auto_rm:
                database_path.unlink()
        except OSError:
            pass

Note how we create the db engine with the database_path of a unique db. It’s good practice to close always the connection db.close() and use engine.dispose() to clean the whole connection pool, so each test starts fresh. Finally, as said before, auto_rm=True will delete the current cloned db that is being used by the test with database_path.unlink().

On the project there are currently 14 tests that uses a db connection. Let’s take a look at one test as example.

def test_create_book_success():
    with open_test_session() as db:
        service = BookService(db)

        result = service.register_new_book(BookCreate(title="Dune", author="Frank Herbert"))

        assert isinstance(result.id, uuid.UUID)
        assert result.title == "Dune"
        assert result.author == "Frank Herbert"

The test test_create_book_success() demonstrates how a new book is registered using BookService(). Before executing the test logic itself, our contextmanager is used via with open_test_session() as db, which provides a fresh session connected to a cloned test database. This session is then injected into the service as BookService(db), allowing all operations to run against an isolated database instance and making it possible to safely validate the behavior of the code.

We run the tests with uv run pytest app/tests.

Consider we are testing with auto_rm=False.

========================================================== test session starts ===========================================================
platform darwin -- Python 3.12.13, pytest-9.1.1, pluggy-1.6.0
rootdir: /Users/asmithz/Projects/alembic-migration-snippet
configfile: pyproject.toml
collected 16 items

app/tests/test_create_book.py .....                                                                                                [ 31%]
app/tests/test_delete_book.py ...                                                                                                  [ 50%]
app/tests/test_get_book.py ..                                                                                                      [ 62%]
app/tests/test_list_books.py ..                                                                                                    [ 75%]
app/tests/test_update_book.py ....                                                                                                 [100%]

=========================================================== 16 passed in 0.22s ==========================================================

Cool al tests passed. Now let’s check whats inside /tmp. You could get the latest as ls -lstr | tail -n 15.

40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 af11ccd8cbce5c91cfc6e2517225c8cd577c32dddb2629f0ce7354b9a057d5b8.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 71653e046a08fcd487337858fa446c251ce7c590325efbd786bb3e417523a67a.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 d105c5e0a9f99802ed31ad562be388a63e8dc37620b923e37fac40b38b00853a.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 e5ec9b8ca3687b65694e2ec202afa346fd833dc3699e48a059bdc1e42c5594d6.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 10fa63ec93f292500f6fa5aaeb971a10d53de7a2fe6fba4f3ebbc11174cdd92c.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 9576fcf05b471c4229679892e1fc1594a9dc42e86534a6999d8d29d10497ad86.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 a741bd2761cfe32c224b7922d1c80144185ce666d4fd5ac06ea7b998b75dd7d1.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 af336d0fef664060eeaf8619a4687df8f0cf8c45d33d7a1c3b1b146fd738d180.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 7b64338ce3bc483e2d74bf6ef5860e5c4536d73a4cf7a5a49a469ff1f897a81b.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 c0aff7f8fd12485441bdd28aca68dee4a8883d18c74250d53749756627952f06.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 133f1495b2e7b5a64f0122bdc26e1f81c0d52fa1108cdff9818c19899483ad8e.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 3e5ff8dc321a2bf0f965cc16fba0f92430796cac114fda1e353f003cc5b29dcf.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 de6dd638f0aa699030ede083883a75b3df0c8866360dec0d59304572e13e011c.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 085426eb9201ea00b064420ee77c44a6c353b5a756b2e269049f4a6262c580be.db
40 -rw-r--r--@ 1 asmithz  wheel  20480 Jun 22 23:51 8500812cc22921ef1ca34507d3d09e77b26200a3e12854b57b2efee84edda706.db

As we can see, on the first run we end up with 15 .db files. One of them is the golden database, while the remaining files are copies derived from it. Since there are 14 tests, the initial execution produces n + 1 database files, where n corresponds to the number of tests that require a database connection.

If we run the tests again, the behavior changes: we only get n .db files. This is because the golden database has already been created and reused, avoiding the need to regenerate it on subsequent runs!!.

This approach is quite practical, but it is important to note that it is mostly suited for SQLite due to how it handles file-based databases and lightweight connections. It offers an alternative pattern for managing database state during testing, especially in scenarios where isolation and reproducibility are required. This is a first iteration, so feel free to create a pull request.

Overall, I found the original idea quite useful for specific cases and implementing it with Alembic and SQLAlchemy, a stack I frequently work with, was a rewarding exercise. I plan to continue exploring similar blog posts, adapting their ideas into tools I use, and evaluating how they translate in practice. Even small variations in approach can be insightful, and they may also be useful to others working with similar stack!!.

— asz