alembic icon indicating copy to clipboard operation
alembic copied to clipboard

document what include_schemas does and include notes about the significance of default schema name

Open bfmcneill opened this issue 4 years ago • 7 comments

I am using MSSQL.

For each "dbo" table I have a mirrored design for a "stage" table.

Due to the workflow I have, I bulk insert data into a "stage" table and then trigger a merge process on sql server wrapped in a stored procedure I call from sqlalchemy.

Everything is working good but the schema migrations continuously detect changes in the foreign keys - generating stuff that shouldn't be generated after the initial migration.

I read a bunch of forums and the workarounds seem a little dated and really complex. I was curious is there is a better way to work with schema migrations?

What I have done is just use all dbo schema and attach "stage" to the end of the stage table name. I'd prefer to have two schemas but I struggle to understand the workaround.

https://groups.google.com/forum/#!topic/sqlalchemy-alembic/FA8wpmPJc7U

https://stackoverflow.com/questions/41150956/alembic-not-generating-correct-changes/41193382#41193382

bfmcneill avatar Aug 19 '20 22:08 bfmcneill

those two issues you linked both deal with a feature specific to the PostgreSQL database called the "search path", which does not apply to the SQL Server database. There are no known issues in SQL Server detection of foreign key changes, so you would need to provide specifics. To the degree that your stage / trigger/ stored procedure architecture is somehow related to how your foreign keys are represented, feel free to mention that also, otherwise I'm not sure how these elements are related to your actual problem. It looks like you perhaps have some kind of naming convention thing going on.

zzzeek avatar Aug 20 '20 04:08 zzzeek

Here is a little more info.

I have an area model that when alembic runs autogenerate everything works as expected.

I upgrade head

I then run the routes model migration and autogenerate creates the foreign key constraint in the migration script (I think it is expected behavior)

I upgrade head

I then run the personnel model migration which has no interaction with route or area models at this point. The migration script auto generates a route table foreign key constraint drop and then create.

Sorry about putting the postgres links in there, it seemed to result in a similar issue I am experiencing.

FYI I have tried to resolve this with and without include_schemas=True in the env.py

Also, not sure if this matters but I can only get the migrations to work without error by placing full schema.table.column in the mapping

areas model

# areas.py
import sqlalchemy as sa
from sqlalchemy import orm
from oil_sale_db.data.sqlalchemybase import SqlAlchemyBase


class Area(SqlAlchemyBase):
    __tablename__ = "areas_tb"
    area_id = sa.Column(sa.String(50), primary_key=True)
    area_name = sa.Column(sa.String(50))
    routes = orm.relation("Route", back_populates="area")
    __table_args__ = {"schema": "dbo"}


class AreaStaging(SqlAlchemyBase):
    __tablename__ = "areas_tb"
    area_id = sa.Column(sa.String(50), primary_key=True)
    area_name = sa.Column(sa.String(50))
    __table_args__ = {"schema": "stage"}

routes model

#routes.py
import sqlalchemy as sa
from sqlalchemy import orm
from oil_sale_db.data.sqlalchemybase import SqlAlchemyBase


class Route(SqlAlchemyBase):
    __tablename__ = "routes_tb"
    route_id = sa.Column(sa.String(50), primary_key=True)
    route_number = sa.Column(sa.String(50))
    route_name = sa.Column(sa.String(50))
    area_id = sa.Column(sa.String(50))
    area = orm.relation("Area")
    __table_args__ = (
        sa.ForeignKeyConstraint(
            ["area_id"], ["dbo.areas_tb.area_id"], name="fk_route_area"
        ),
        {"schema": "dbo"},
    )


class RouteStaging(SqlAlchemyBase):
    __tablename__ = "routes_tb"
    route_id = sa.Column(sa.String(50), primary_key=True)
    route_number = sa.Column(sa.String(50))
    route_name = sa.Column(sa.String(50))
    area_id = sa.Column(sa.String(50))
    __table_args__ = {"schema": "stage"}

routes autogen migration script

#create_routes_table.py

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('routes_tb',
    sa.Column('route_id', sa.String(length=50), nullable=False),
    sa.Column('route_number', sa.String(length=50), nullable=True),
    sa.Column('route_name', sa.String(length=50), nullable=True),
    sa.Column('area_id', sa.String(length=50), nullable=True),
    sa.PrimaryKeyConstraint('route_id'),
    schema='stage'
    )
    op.create_table('routes_tb',
    sa.Column('route_id', sa.String(length=50), nullable=False),
    sa.Column('route_number', sa.String(length=50), nullable=True),
    sa.Column('route_name', sa.String(length=50), nullable=True),
    sa.Column('area_id', sa.String(length=50), nullable=True),
    sa.ForeignKeyConstraint(['area_id'], ['dbo.areas_tb.area_id'], name='fk_route_area'),
    sa.PrimaryKeyConstraint('route_id'),
    schema='dbo'
    )
    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_table('routes_tb', schema='dbo')
    op.drop_table('routes_tb', schema='stage')
    # ### end Alembic commands ###

personnel model

#personnel.py

import sqlalchemy as sa
from sqlalchemy import orm
from oil_sale_db.data.sqlalchemybase import SqlAlchemyBase


class Personnel(SqlAlchemyBase):
    __tablename__ = "personnel_tb"
    personnel_id = sa.Column(sa.String(50), primary_key=True)
    name = sa.Column(sa.String(50))
    title = sa.Column(sa.String(50))
    role = sa.Column(sa.String(50))
    primary_email = sa.Column(sa.String(50))
    primary_phone = sa.Column(sa.String(50))
    __table_args__ = {"schema": "dbo"}


class PersonnelStaging(SqlAlchemyBase):
    __tablename__ = "personnel_tb"
    personnel_id = sa.Column(sa.String(50), primary_key=True)
    name = sa.Column(sa.String(255))
    title = sa.Column(sa.String(255))
    role = sa.Column(sa.String(255))
    primary_email = sa.Column(sa.String(50))
    primary_phone = sa.Column(sa.String(50))
    __table_args__ = {"schema": "stage"}

personnel migration script ( some reason it is also picking up route changes I did not make)

#create_personnel_tb.py

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('personnel_tb',
    sa.Column('personnel_id', sa.String(length=50), nullable=False),
    sa.Column('name', sa.String(length=50), nullable=True),
    sa.Column('title', sa.String(length=50), nullable=True),
    sa.Column('role', sa.String(length=50), nullable=True),
    sa.Column('primary_email', sa.String(length=50), nullable=True),
    sa.Column('primary_phone', sa.String(length=50), nullable=True),
    sa.PrimaryKeyConstraint('personnel_id'),
    schema='dbo'
    )
    op.create_table('personnel_tb',
    sa.Column('personnel_id', sa.String(length=50), nullable=False),
    sa.Column('name', sa.String(length=255), nullable=True),
    sa.Column('title', sa.String(length=255), nullable=True),
    sa.Column('role', sa.String(length=255), nullable=True),
    sa.Column('primary_email', sa.String(length=50), nullable=True),
    sa.Column('primary_phone', sa.String(length=50), nullable=True),
    sa.PrimaryKeyConstraint('personnel_id'),
    schema='stage'
    )
    op.drop_constraint('fk_route_area', 'routes_tb', type_='foreignkey')
    op.create_foreign_key('fk_route_area', 'routes_tb', 'areas_tb', ['area_id'], ['area_id'], source_schema='dbo', referent_schema='dbo')
    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_constraint('fk_route_area', 'routes_tb', schema='dbo', type_='foreignkey')
    op.create_foreign_key('fk_route_area', 'routes_tb', 'areas_tb', ['area_id'], ['area_id'])
    op.drop_table('personnel_tb', schema='stage')
    op.drop_table('personnel_tb', schema='dbo')
    # ### end Alembic commands ###

bfmcneill avatar Aug 20 '20 15:08 bfmcneill

OK so first thing we want to try, is that "dbo" is often the default schema name for SQL Server, so it is likely that the dialect is considering foreign key constraints that come from "dbo" as not having an explicit schema. You know this because you see op.drop_constraint('fk_route_area', 'routes_tb', type_='foreignkey') - notice there's no schema. So one way (really the main way it will work simply for now) to fix that is you need to omit the "dbo" symbol from your Table and ForeignKey definitions, because it is implicit.

next, alembic will not attempt to find any tables out of the default schema unless you specify include_schemas=True. so if you have model / Table definitions with an explicit schema name, then you do have to have include_schemas=True if you want those picked up.

Next, it might be very useful to use an include_object to check every Table object coming in and making sure the .schema is one you care about. This can be used to distinguish between tables that have .dbo as the schema name vs. tables that report "None" as the schema name, which is what happens when the reported schema name of the table matches the default schema name.

This hook can be used to skip Table objects, ForeignKeyConstraint objects, etc., whether the originate from your model or from the database being reflected. but also, you can use the pdb tool to give yourself a command prompt right inside that hook, look at each object coming in, and figure out why it's there.

another thing you should do, just to see how SQLAlchemy is inspecting your database, is to use the inspect interface to see exactly what SQLAlchemy is going to see in your database:

[classic@photon3 sqlalchemy]$ python
Python 3.8.3 (default, May 23 2020, 16:34:37) 
[GCC 9.3.1 20200408 (Red Hat 9.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sqlalchemy import create_engine, inspect
>>> e = create_engine("mssql+pyodbc://scott:tiger^5HHH@mssql2017:1433/test?driver=ODBC+Driver+13+for+SQL+Server")
>>> ins = inspect(e)
>>> ins.get_table_names()
['t1', 't2', 't3']
>>> ins.get_table_names(schema="test_schema")
[]
>>> ins.get_foreign_keys("t1")
[]
>>> ins.get_foreign_keys("t2")
[{'name': 'FK__t2__t1_id__4119A21D', 'constrained_columns': ['t1_id'], 'referred_schema': None, 'referred_table': 't1', 'referred_columns': ['t1_id']}]
>>> 

alembic is not doing anything more complicated than using that inspector object and comparing what it sees to what is in your Python-declared models. we can look at what Alembic is going to do when you have include_schemas=True:

  1. Alembic will get the list of all schema names:
>>> ins.get_schema_names()
['db_accessadmin', 'db_backupoperator', 'db_datareader', 'db_datawriter', 'db_ddladmin', 'db_denydatareader', 'db_denydatawriter', 'db_owner', 'db_securityadmin', 'dbo', 'guest', 'INFORMATION_SCHEMA', 'scott', 'sys', 'test_schema']

above, we can see it reports "dbo" as well as "scott" for me, that's my default schema:

>>> e.dialect.default_schema_name
'scott'

how does it get that name? using SQL Server schema_name():

>>> with e.connect() as conn:
...     conn.scalar("SELECT schema_name()")
... 
'scott'

The next thing alembic does is loop through every schema name, as well as "None", and gets the names of all the tables. so first None:

>>> ins.get_table_names()
['t1', 't2', 't3']

next let's say DBO for me, mine has nothing:

>>> ins.get_table_names(schema="dbo")
[]

For the schema that's named "scott", alembic knows this is the default schema name, so it skips it. If we did check for the tables in "scott", we'd get the same tables a second time:


# alembic won't do this one since 'scott' is the default schema
>>> ins.get_table_names(schema="scott")
['t1', 't2', 't3']

We can observe that when alembic looks at the foreign key constraints, whether it does it via "None" or "scott" changes the result. this is likely what is causing your autogenerate to do what it's doing:

# notice the second one has "referred_schema='scott'", the first one None

>>> ins.get_foreign_keys("t2")
[{'name': 'FK__t2__t1_id__4119A21D', 'constrained_columns': ['t1_id'], 'referred_schema': None, 'referred_table': 't1', 'referred_columns': ['t1_id']}]
>>> ins.get_foreign_keys("t2", schema="scott")
[{'name': 'FK__t2__t1_id__4119A21D', 'constrained_columns': ['t1_id'], 'referred_schema': 'scott', 'referred_table': 't1', 'referred_columns': ['t1_id']}]
>>> 

So I think basically you need to confirm "dbo" is coming out as your "default" schema name for your particular database connection, and then omit it from your metadata models.

zzzeek avatar Aug 20 '20 18:08 zzzeek

After applying what you wrote, I was able to get my super powers back! Thanks Mike!

Thanks again for the really detailed explanation of how things are working under the hood.

bfmcneill avatar Aug 20 '20 20:08 bfmcneill

great. I think the stuff I wrote above should be in the docs probably, beacsue there's otherwise no real explaination of waht "include_schemas" does, and I had to read the code to get the description just right. ill tag this as docs

zzzeek avatar Aug 20 '20 20:08 zzzeek

Hi

I was hitting a similar wall as the author of this issue. I'm also using MSSQL. I don't explicitely declared the schema in the mapper if it refers to a table in the default schema (which is "dbo", also recongnised by sqlalchemy). In all other cases (i have some more schemas) I explicitly used table_args to declare the schema.

__table_args__ = {'schema': 'assets'}

I mostly declare my foreign keys inline and without the schema if it is the default, like:

project_id = Column('ProjectID', Integer, ForeignKey('Projects.ProjectID'))

Everything is working as expected (in terms of DB operations) except that I also get many foreign keys "recreated" in the migration script. I also use the include_schemas flag so that all schemas get inspected. For me it seems that all foreign keys from a table other than the default schema that refer a table in the default schema lead to constant recreation of migration commands.

I tried to use include_object as described in the documentation but that didn't really help or at least i didn't get it to work properly. Should i exclude the foreign keys there or the tables? How would I do that?

Thanks for any advice and cheers for the great answer above. That made many things clearer for me!

sebarti avatar Dec 12 '20 08:12 sebarti

hey there -

short answer is I would try to exclude the tables entirely with include_object. it all depends on what is explicitly present in your defined models vs. what isn't. anything that is defined in your model should be compared and would not generate a migration if things are working correctly, but it is very possible SQL Server w/ multiple schemas has issues, not sure. things that are not at all in your model would need to be excluded.

zzzeek avatar Dec 12 '20 14:12 zzzeek