RFC: Better migration architecture
From my experience using aerich in a production system, the current migration approach leaves much to be desired.
Here are my complaints with the current architecture:
- Using raw SQL is not database-independent, so you are tied to using the same database type for testing and production.
- Sometimes the SQL commands are out of order.
- You can't do custom migration logic. All you can do is tweak a generated SQL script.
This is what I've come up with as a better approach:
I think a migration file should consist of the following items:
- The current and previous migration ID
- The old schema as a dict literal
- The new schema as a dict literal
- A list of migration operations such as field renames, table creation, etc. with database-specific information only where absolutely unavoidable.
- A function which can be used to do manual data migrations in Python or SQL, and its complement for downgrades
Benefits of this approach:
- Database-agnostic
- A clear way of describing what is going to happen to the database without having to mess with SQL.
- Custom upgrade and downgrade logic
- The aerich tool can lint the operations so that the schema stays consistent even if a user edits the migration file.
- You could ask aerich to generate an null migration file and then implement some data-only (non-schema) updates in the custom migration logic.
Conceptual example:
# This is a migration generated by the aerich tool.
# You may edit this file, but be warned that your changes will be overwritten
# if you run aerich migrate again without upgrading the database to this migration.
from tortoise import BaseDBAsyncClient
# This is the ID of the migration previous to this one, or None if this is the first.
PREVIOUS_MIGRATION_ID = '20241121101234'
# This is the ID of the migration, used to determine if the migration has been run.
MIGRATION_ID = '20241121142100'
# This is the schema of the database as of the last migration (maybe could be imported directly from previous migration?)
OLD_SCHEMA = {
'MyModel': {
'table_name': 'my_models',
'fields': [
'id': {
'type': 'IntField',
'pk': True,
'null': False,
},
'name': {
'type': 'CharField',
'pk': False,
'null': False,
'max_len': 255,
},
...
],
},
....
}
# This is the schema of the database after this migration has been run.
NEW_SCHEMA = {
'MyModel': {
'table_name': 'my_models',
'fields': [
'id': {
'type': 'IntField',
'pk': True,
'null': False,
},
'display_name': {
'type': 'CharField',
'pk': False,
'null': False,
'max_len': 255,
},
...
],
},
....
}
# These are directives to the migration executor about what the migration consists of.
MIGRATION_OPERATIONS = {
{'type': 'rename_field', 'table': 'my_models', 'old_name': 'name', 'new_name': 'display_name'},
}
# This is where you can implement fancy migration logic if the simple cases don't meet your needs.
async def CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn: BaseDBAsyncClient):
# Example (pseudo-code):
names = [(id, f"Hi, I'm {name}") for id, name in dbconn.execute('SELECT id, name FROM my_models').all()]
# Wait for the migrations to be executed
yield
dbconn.executemany('UPDATE my_models SET display_name=? WHERE id=?', names)
# This is where you can implement the reverse logic in the same manner if you need to downgrade.
async def CUSTOM_MIGRATION_SCRIPT_DOWNGRADE(dbconn: BaseDBAsyncClient):
yield
Migration pseudo-code:
# Evaluate the migration file
import migration_file
# Connect to the database
dbconn = get_db_conn()
# Check the previous migration ID against the database
if migration_file.PREVIOUS_MIGRATION_ID is not None and get_last_migration(dbconn) != migration_file.PREVIOUS_MIGRATION_ID:
raise ValueError('You are running migrations out of order!')
# Make sure the operations update the schema correctly in case the user edited them
schema = migration_file.OLD_SCHEMA
for operation in migration_file.MIGRATION_OPERATIONS:
modify_schema(schema, operation)
if schema != migration_file.NEW_SCHEMA:
raise ValueError('The operations do not modify the schema correctly!')
# Run the first part of the user's upgrade logic
asyncgen = await migration_file.CUSTOM_MIGRATION_SCRIPT_UPGRADE(dbconn)
await anext(asyncgen)
# Run the migration operations
for operation in migration_file.MIGRATION_OPERATIONS:
execute_migration_operation(operation, dbconn, downgrade=False)
# Run the second part of the user's upgrade logic
try:
await anext(asyncgen)
except StopIteration:
pass
else:
raise RuntimeError('Only one yield statement is allowed!')
# Update the migrations
add_migration_to_migrations_table(dbconn, migration_file.MIGRATION_ID)
# Done
Linking #76 #148 #243
@merlinz01 thank you for sharing your ideas!
A few comments/questions:
- Having both
OLD_SCHEMAandNEW_SCHEMAwould make the migration files quite large - How would you populate
NEW_SCHEMAin the case where you are starting from an empty (null) migration? - Have you looked into other migrations tools? For instance, Django's migrations or alembic? What do you think about their approach to solving migrations?
In answer to your questions.
- Yes, having both old and new schemas would make the migration files significantly larger. The idea here is to reduce ambiguity for the migration tool, to eliminate the need for a database connection when generating a migration file, and to be able to check the operations against the new schema in case the user modifies one or the other. However, it should be workable to "import" the schema from the previous migration, because there is going to be trouble anyway if that file doesn't exist. So instead of
OLD_SCHEMA = { ... }we should have something likeOLD_SCHEMA = import_schema_from_migration_file(PREVIOUS_MIGRATION_ID). - If a user is not making schema changes, aerich could just generate
NEW_SCHEMA = NoneorMIGRATION_OPERATIONS = Noneand use that as a signal to skip the schema migration stuff. - Django seems to employ a similar concept as far as defining operations instead of generating SQL. Sample Django migration files:
# Generated by Django 5.1.3 on 2024-11-25 20:41
from django.db import migrations, models
class Migration(migrations.Migration):
initial = True
dependencies = []
operations = [
migrations.CreateModel(
name="TestModel",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
("name", models.CharField(max_length=100)),
("age", models.IntegerField()),
],
),
]
# Generated by Django 5.1.3 on 2024-11-25 20:42
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("testmigrations", "0001_initial"),
]
operations = [
migrations.RenameField(
model_name="testmodel",
old_name="age",
new_name="age2",
),
]
Alembic is fairly similar to what I came up with. I like the fact that it enables type-checking on the operations.
"""test
Revision ID: f0f068244d58
Revises: e8fa71e392c3
Create Date: 2024-11-25 16:08:05.689241
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = 'f0f068244d58'
down_revision: Union[str, None] = 'e8fa71e392c3'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('test_models', sa.Column('test', sa.String(length=20), nullable=False))
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_column('test_models', 'test')
# ### end Alembic commands ###
Dropping this link here mostly for interest's sake: https://github.com/gobuffalo/fizz/blob/main/README.md
https://github.com/tortoise/aerich/issues/324#issuecomment-1794581022
Hello everyone. I have few ideas that I want to share.
I support the idea of eliminating the need to store the schema in the database and instead building the schema state from migration files, similar to Django’s approach. This means there’s no need to store the old schema in the database or in the migration files, as we can reconstruct it from the list of applied migrations.
For migration operations, I think Django’s approach is excellent. We can store a list of operations, each operation with methods like upgrade and downgrade (the reverse operation). For migrations generated by previous versions, we can wrap them with a RawSQL operation to ensure a smooth transition to the new version. This approach would make migrations more flexible and easier to manage.
@Abdeldjalil-H I actually worked for a few days on trying to port django.db.migrations to work with Tortoise models. I got it to the state where it was able to generate a migration file for a bunch of Tortoise models using Django's operations like CreateModel, AddField, etc. I basically copied the related code from Django, and iterated on it by running and fixing issues until it was able to generate a migration.
The code is pretty raw there, I mostly ad-hoc fixed of the Django's code, and only migration generation for new models is working, so it requires more efforts until multiple people can work on it simultaneously but this is a path we can take.
Interesting @henadzit I also greatly prefer the Django. I find Django's system, where the migration files describe the evolution of the model class, much cleaner. You can generate the required engine-specific SQL statements at run-time depending on the engine you find yourself connected to in each environment. e.g. we favour sqlite for devs and postgres in prod. I would be willing to contribute or help maintain.
hey @shuckc, sorry, I missed your message. I spent a bit more time trying to convert django's migrations framework but realized that it is A LOT of work. If someone wants to pick it up and continue, I can publish what I got but it isn't going to be easy.
If someone would be willing to go down that road - you can start from my attempt at it
https://github.com/tortoise/tortoise-orm/pull/406 - I managed to do a lot of work there, but even more work was left to be done
Making Django-like migration system is pretty hard task and I'll be happy to help anyone who thinks he can make it
I'm working on another schema migration tool for Tortoise. It is in early development and by no means is ready for production but I think the foundation is right.
Features:
- Generate schema migrations from Tortoise models
- Apply and revert migrations
- No need for a database connection to generate migrations
- If you have used Django migrations, you will feel at home
- SQLite and Postgres seems to work but expect a lot of bugs
A migration example:
class AutoMigration(Migration):
"""
Auto-generated migration based on model changes.
"""
dependencies = []
operations = [
CreateModel(
model="models.Post",
fields={
"id": IntField(primary_key=True, unique=True),
"title": CharField(max_length=255),
"content": TextField(),
"author": ForeignKeyFieldInstance(
"models.User",
related_name="posts",
on_delete="CASCADE",
source_field="author_id",
),
"created_at": DatetimeField(auto_now_add=True),
"updated_at": DatetimeField(auto_now_add=True),
},
),
CreateModel(
model="models.User",
fields={
"id": IntField(primary_key=True, unique=True),
"name": CharField(max_length=255),
"email": CharField(unique=True, max_length=255),
},
),
]
Code contributions and assistance with testing is welcome and appreciated!
@henadzit Personally, I perfer to use makemigrations/migrate rather that make/migrate or migrate/upgrade.
#441 introduces (almost) offline migration creation, which is one part of the goal of this proposal. 👍