warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Squash Alembic migrations to improve test execution time

Open DarkaMaul opened this issue 10 months ago • 2 comments

warehouse test suite currently executes 246 Alembic migrations sequentially during database setup, adding significant overhead to test execution times. I investigated squashing these migrations into a single migration that represents the current database state.

Using alembic's automatic migration feature provides a good starting point, but requires manual review to address limitations:

  1. SQL functions are not automatically exported
  2. Triggers are not included in the export
Configuration Full Tests API Tests
With Squash 19.69s 3.12s
Without Squash 21.75s 3.66s

Note: Currently experiencing 128 failures out of 4402 tests in the full test suite, while API tests are passing successfully.

Proposed Implementation:

  1. Preserve historical migrations by moving them to migrations/old/
  2. Replace current migrations with a single consolidated migration in migrations/versions/

Primary Consideration:

The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.

I'm opening this issue to discuss if we should proceed forward before trying to fix every edge case in the migration.

/cc @miketheman @woodruffw

DarkaMaul avatar Feb 11 '25 16:02 DarkaMaul

I am generally 👍 on things that go faster, so yay!

When it comes to alembic and initial DB state, that's where it's a little tricky.

For the functions and triggers not represented - is that due to an alembic limitation, or some other reason? Do you have a sense of which those are? I'm curious if they predate warehouse or exist for some other reason.

Considering that all migrations live in git history, do we need to preserve them at all? I routinely search git history with git log -p -S <somestring> to find the commits with that text, and then can use other techniques to browse from there.

The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.

Possibly, however it's rare that we roll back the db more than a single migration, usually due to some deployment issue or other error.

I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..

miketheman avatar Feb 11 '25 20:02 miketheman

I finished running the migration today and fixed the tests - everything is green now, but there is a slight coverage difference I still need to investigate.

There are 11 triggers in the database:

  • projects_update_sitemap_bucket
  • projects_update_normalized_name
  • accounts_user_update_sitemap_bucket
  • update_user_password_date
  • update_project_last_serial
  • release_files_requires_python
  • releases_requires_python
  • update_project_total_size_release_files
  • releases_update_is_prerelease
  • normalize_prohibited_project_names (this one was named normalize_blacklist but I renamed it to keep it consistent with the new table name)

They are used to call functions before / after events on certain table:

CREATE TRIGGER normalize_prohibited_project_names
            BEFORE INSERT OR UPDATE ON prohibited_project_names
            FOR EACH ROW EXECUTE PROCEDURE ensure_normalized_prohibited_project_names();
    )

Alembic does not support detecting triggers or procedures. alembic-utils does but you need to register them using their functions (which was not the case in warehouse)

What was needed:

  • adding procedures and triggers
  • adding extensions
  • adding certain tables values ( admin_flags, prohibited_project_names)
  • the Python script generated was slightly invalid (missing imports)
  • adding a constraint that had not been detected (ck_disallow_private_top_level_classifier)

With the newly squashed migration all green, the tests are now even faster (~17.2s) :tada:

I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..

This should definitely be taken into account - but I don't know how I can help there.

DarkaMaul avatar Feb 14 '25 14:02 DarkaMaul