warehouse
warehouse copied to clipboard
Squash Alembic migrations to improve test execution time
warehouse test suite currently executes 246 Alembic migrations sequentially during database setup, adding significant overhead to test execution times. I investigated squashing these migrations into a single migration that represents the current database state.
Using alembic's automatic migration feature provides a good starting point, but requires manual review to address limitations:
- SQL functions are not automatically exported
- Triggers are not included in the export
| Configuration | Full Tests | API Tests |
|---|---|---|
| With Squash | 19.69s | 3.12s |
| Without Squash | 21.75s | 3.66s |
Note: Currently experiencing 128 failures out of 4402 tests in the full test suite, while API tests are passing successfully.
Proposed Implementation:
- Preserve historical migrations by moving them to
migrations/old/ - Replace current migrations with a single consolidated migration in
migrations/versions/
Primary Consideration:
The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.
I'm opening this issue to discuss if we should proceed forward before trying to fix every edge case in the migration.
/cc @miketheman @woodruffw
I am generally 👍 on things that go faster, so yay!
When it comes to alembic and initial DB state, that's where it's a little tricky.
For the functions and triggers not represented - is that due to an alembic limitation, or some other reason? Do you have a sense of which those are? I'm curious if they predate warehouse or exist for some other reason.
Considering that all migrations live in git history, do we need to preserve them at all? I routinely search git history with git log -p -S <somestring> to find the commits with that text, and then can use other techniques to browse from there.
The main tradeoff is losing the ability to easily roll back to intermediate database states. We could mitigate this by choosing an earlier cutoff date for the consolidated migration if needed.
Possibly, however it's rare that we roll back the db more than a single migration, usually due to some deployment issue or other error.
I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..
I finished running the migration today and fixed the tests - everything is green now, but there is a slight coverage difference I still need to investigate.
There are 11 triggers in the database:
- projects_update_sitemap_bucket
- projects_update_normalized_name
- accounts_user_update_sitemap_bucket
- update_user_password_date
- update_project_last_serial
- release_files_requires_python
- releases_requires_python
- update_project_total_size_release_files
- releases_update_is_prerelease
- normalize_prohibited_project_names (this one was named
normalize_blacklistbut I renamed it to keep it consistent with the new table name)
They are used to call functions before / after events on certain table:
CREATE TRIGGER normalize_prohibited_project_names
BEFORE INSERT OR UPDATE ON prohibited_project_names
FOR EACH ROW EXECUTE PROCEDURE ensure_normalized_prohibited_project_names();
)
Alembic does not support detecting triggers or procedures. alembic-utils does but you need to register them using their functions (which was not the case in warehouse)
What was needed:
- adding procedures and triggers
- adding extensions
- adding certain tables values ( admin_flags, prohibited_project_names)
- the Python script generated was slightly invalid (missing imports)
- adding a constraint that had not been detected (
ck_disallow_private_top_level_classifier)
With the newly squashed migration all green, the tests are now even faster (~17.2s) :tada:
I think the Most Important Part of this kind of effort is that running the new "base" migration should be a no-op in production, and since the dev db doesn't fully represent the production database, that's a risk we'll have to consider and mitigate..
This should definitely be taken into account - but I don't know how I can help there.