alembic icon indicating copy to clipboard operation
alembic copied to clipboard

support CHECK constraints for autogenerate

Open sqlalchemy-bot opened this issue 7 years ago • 44 comments

Migrated issue, originally created by Alex Rothberg

If I start with the following and then uncomment the CheckConstraint, alembic does not handle the migration adding it or even print a warning. If I just have alembic add the entire class the migration to add the class DOES have the CheckConstraint.

class TitleDepartmentYear(db.Model):
    id = db.Column(UUID, default=uuid.uuid4, primary_key=True)
    name = db.Column(db.String(), nullable=False)

    __table_args__ = (
#        db.CheckConstraint(
#           sqa.func.char_length(name) > 0,
#          name="title_department_year_name_min_length",
#      ),
    )

sqlalchemy-bot avatar Sep 25 '18 18:09 sqlalchemy-bot

Michael Bayer (@zzzeek) wrote:

note this is a documented limitation:

http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect

sqlalchemy-bot avatar Sep 25 '18 20:09 sqlalchemy-bot

Alex Rothberg wrote:

I started taking a stab at this. Does it looks like I am on the right track? Is there some missing feature in sqla, etc that will lead to a dead end here?

@comparators.dispatch_for("table")
def compare_table_level(autogen_context, modify_ops,
    schemaname, tablename, conn_table, metadata_table):

    metadata_check_constraints = set(
        ck for ck in metadata_table.constraints
        if isinstance(ck, sa_schema.CheckConstraint)
    )
    conn_check_constraints = set(
        ck for ck in conn_table.constraints
        if isinstance(ck, sa_schema.CheckConstraint)
    )

    for metadata_check_constraint in metadata_check_constraints:
        # barfs on something about quoting:
        metadata_check_constraint_name = sqla_compat._get_index_final_name(autogen_context.dialect, metadata_check_constraint)
        # i'm lazy; clearly this is not efficient:
        if metadata_check_constraint_name not in (conn_check_constraint.name for conn_check_constraint in conn_check_constraints):
            modify_ops.ops.append(ops.CreateCheckConstraintOp.from_constraint(metadata_check_constraint))

# needed to do this since I am writing this as a plugin and there is already a method registered:
del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]

@renderers.dispatch_for(ops.CreateCheckConstraintOp)
def _add_check_constraint(autogen_context, constraint):
    # not quite right as I need to sub in query params:
    args = ['"%s"' % constraint.constraint_name, '"%s"' % constraint.table_name, '"%s"' % constraint.condition]
    return "%(prefix)screate_check_constraint(%(args)s)" % {
        'prefix': _alembic_autogenerate_prefix(autogen_context),
        'args': ", ".join(args)
    }

sqlalchemy-bot avatar Oct 10 '18 05:10 sqlalchemy-bot

Michael Bayer (@zzzeek) wrote:

that's a fragment of a beginning, sure, it would need to be in the context of the source code in compare.py to really be reviewable.

With this feature, the much bigger job is writing out the tests, which is because the constraint autogen use case has an enormous number of edge cases that are also showstoppers if they are not working correctly, because within autogen, a failed comparison means false positives. A lot of applications use Alembic's comparison logic to generate schema diffs which need to come up clean if nothing has changed, so any area where a backend either doesn't provide CHECK constraint reflection, or the content of the constraint doesn't exactly match what is in the Python model, show-stopping bug. Additionally, SQLAlchemy has datatypes Enum and Boolean which automatically generate CHECK constraints and I'm not sure of how we will will handle accommodating for this, detecting which CHECK constraints need to be explicit and which do not.

When the compare indexes and uniques feature was introduced, I spent literally years fixing bugs of this nature, which is why you will see a very large number of tests in https://github.com/zzzeek/alembic/blob/master/tests/test_autogen_indexes.py. I just did a grep, found 31 distinct issues involving unique and index autogenerate in changelog.rst, see https://gist.github.com/zzzeek/530634270e5631b973fa1d70885aba5a.

So for this feature, writing a new test suite test_autogen_check.py is a big part of getting this to work. If the "CHECK" constraint autogen feature is added as an optional feature that's enabled in the environment, that would make it easier to release without as much risk of regressions.

sqlalchemy-bot avatar Oct 10 '18 15:10 sqlalchemy-bot

Alex Rothberg wrote:

Fair. In the meantime I might start working on this as a plugin with the plan to get it mature enough that it can be submitted for addition to alembic proper.

I was wondering if there is a work around to:

del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]

and what is the best way to get the "real" name for the check constraint? (ie sqla_compat._get_index_final_name doesn't seem to work for check constraints).

sqlalchemy-bot avatar Oct 10 '18 16:10 sqlalchemy-bot

Michael Bayer (@zzzeek) wrote:

The rendering of a CHECK constraint object is separate from the detection which is the hard part, you can submit a rendering PR right now. additionally, the renderers.dispatch_for should be able to replace the previous entry, if it doesn't do that right now I can take a PR for that too.

sqlalchemy-bot avatar Oct 10 '18 16:10 sqlalchemy-bot

Alex Rothberg wrote:

https://bitbucket.org/zzzeek/alembic/src/37ef35fbe673d34b5f8d1a0697cf01d10b0abe05/alembic/util/langhelpers.py#lines-285

sqlalchemy-bot avatar Oct 10 '18 16:10 sqlalchemy-bot

Michael Bayer (@zzzeek) wrote:

sure we can reverse that

sqlalchemy-bot avatar Oct 10 '18 16:10 sqlalchemy-bot

Changes by Michael Bayer (@zzzeek):

  • added labels: low priority

sqlalchemy-bot avatar Sep 25 '18 20:09 sqlalchemy-bot

Changes by Michael Bayer (@zzzeek):

  • removed labels: bug
  • added labels: feature

sqlalchemy-bot avatar Sep 25 '18 20:09 sqlalchemy-bot

Changes by Michael Bayer (@zzzeek):

  • added labels: autogenerate - detection

sqlalchemy-bot avatar Sep 25 '18 20:09 sqlalchemy-bot

Changes by Michael Bayer (@zzzeek):

  • changed title from "CheckConstraint not detected" to "support CHECK constraints for autogenerate"

sqlalchemy-bot avatar Sep 25 '18 20:09 sqlalchemy-bot

There would be one way to detect the changed check constraint, that would be quite easy to implement and would work in the most environments: As far as I can see, the problem is, that for example in postgres 'consrc' is not available anymore. But instead, you could add a further alembic table or extend schema_version to hold the source of the check constraint. Then you can just compare the source in the database to the one of the SQLAlchemy metadata and update (drop and recreate) it when it changes. I'm currently doing that in a project. However, it would require the developers/admins not to mess with that table, so I think this should be a optional behaviour.

adrianschneider94 avatar May 14 '20 18:05 adrianschneider94

I think you have a good idea there, however that's the beginning of a django-like system that is going to serialize the entire model that's being built so that it doesnt even have to use database reflection for autogenerate. I'd rather someone implement the real thing rather than a tiny piece of it for some specific use case.

zzzeek avatar May 14 '20 19:05 zzzeek

I totally see that point.

I looked a bit further into this, for postgres it should be easy to implement in the 'alembic' way:

select pgc.conname as constraint_name,
       ccu.table_schema as table_schema,
       ccu.table_name,
       ccu.column_name,
       pg_get_constraintdef(pgc.oid)
from pg_constraint pgc
join pg_namespace nsp on nsp.oid = pgc.connamespace
join pg_class  cls on pgc.conrelid = cls.oid
left join information_schema.constraint_column_usage ccu
          on pgc.conname = ccu.constraint_name
          and nsp.nspname = ccu.constraint_schema
where pgc.contype ='c'
order by pgc.conname;

gives the check constraint definitions associated with tables and columns.

I don't know about the internals of alembic. Could you tell me where to dig into it, if I wanted to implement it (for postgres)?

adrianschneider94 avatar May 14 '20 19:05 adrianschneider94

And for the comparison there is also an easy approach if you use an external library (pglast):

from pglast import parse_sql, _remove_stmt_len_and_location

statement_1 = "check article_number is null or company_id is not null" # what I wrote in SQLAlchemy
statement_2 = "CHECK (((article_number IS NULL) OR (company_id IS NOT NULL)))" # what I got from the query above

ast1 = parse_sql("SELECT " + statement_1[6:])
ast2 = parse_sql("SELECT " + statement_2[6:])

_remove_stmt_len_and_location(ast1)
_remove_stmt_len_and_location(ast2)

assert ast1 == ast2

adrianschneider94 avatar May 14 '20 20:05 adrianschneider94

sqlalchemy does reflect the constraint definition already, it's just that it doesn't necessarily match letter for letter, so if the pglast thing is useful that may be an option. seems like it is only for postgresql ?

zzzeek avatar May 14 '20 20:05 zzzeek

Yes, unfortunately it only targets postgres. But I guess that:

  • one could find AST parsers for the other dialects, at least in C
  • pglast is compatible to most statements that would be used in a check constraint. And otherwise it would raise an exception and let the user implement it manually

adrianschneider94 avatar May 14 '20 20:05 adrianschneider94

oh @lelit wrote this. friend of the show. We can try sqlparse here as well, and pglast optional plugin?

zzzeek avatar May 14 '20 20:05 zzzeek

I looked into sqlparse for some minutes and it does not build a proper AST, so I'm sceptical that it would work. @lelit started pglast because of the problems with sqlparse as far as I can see from the README, so I think a postgres dedicated solution (probably as a plugin) that works for other dialects in most cases (as long as the queries are easily accessible) would be the way I would personally take.

adrianschneider94 avatar May 14 '20 21:05 adrianschneider94

well if it works it's a start and maybe @lelit wants to expand the scope of it :)

zzzeek avatar May 14 '20 21:05 zzzeek

I just quickly read the above, but sure, I'm willing to expand it's scope, if I can... obviously once I have a better idea of what you mean with that :smile: Feel free to ask/explain/suggest anything.

lelit avatar May 14 '20 21:05 lelit

Hi @lelit,

what I have seen so far from pglast looks great, thanks for that!

@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.

Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.

adrianschneider94 avatar May 14 '20 21:05 adrianschneider94

I spent considerable time trying to use sqlparse for my need, but although it does quite a good job considering it handles a wide spectrum of SQL syntaxes, I didn't find it "precise" enough, and as said by Adrian the internal model its "parser" builds is not malleable enough. OTOH, yes, pglast is very PG specific: the underlying libpgquery is basically the very same parser used by PG itself, so it has access to everything comes out from that.

@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.

No, that's definitely not doable with (current) pglast: one would need to implement the equivalent of libpgquery for other dialects, and once there, implement a new set of "printers" (in pglast parlance, a function that takes an AST and serialize it into a corresponding SQL statement). While I can easily do the latter, I'm afraid that "wrapping" other dialects parser into something usable is beyond my current cycles availability, sorry.

Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.

No, I would say that's totally doable: pglast already supports a lot of PG syntax, and expanding it to cover what is currently missing is quite easy.

lelit avatar May 14 '20 21:05 lelit

alembic_utils handles the "comparing sql stored in strings" problem by rendering the entity in a dummy schema, and comparing the reflected definition from the dummy schema to the reflected definition in the target schema.

It avoids having to parse every dialect of sql and would work on any database that supports constraint inspection. It is a kinda slow for larger projects but I'm not aware of a use case where autogenerating migrations is done in a tight loop :)

olirice avatar May 27 '20 17:05 olirice

Ahh, that's smart 👌🏼

adrianschneider94 avatar May 27 '20 17:05 adrianschneider94

that is an interesting idea.

zzzeek avatar May 28 '20 13:05 zzzeek

we sort of do a bit of that for datatypes now, rather than comparing each element of a reflected datatype with what the model has we compare the rendered strings. but that doesn't include adding an extra round trip to the DB. the "dummy schema" part makes me a little nervous. if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it.

zzzeek avatar May 28 '20 13:05 zzzeek

I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.

The example below tests if the local definition of a function matches the database's definition.

###
# SETUP
###

def get_sql_function_def(sess, schema, name):
    """Look up the SQL definition of a function from a database"""

    return sess.execute("""
        select
            pg_get_functiondef(p.oid)
        from
            pg_proc p
            join pg_namespace n
                on p.pronamespace = n.oid
        where
            n.nspname = 'public'
            and p.proname = 'to_upper'
        limit 1    
    """).fetchone()[0]

# Create an "existing" function
sess.execute("""
    create or replace function to_upper(text)
    returns text
    as
        $$ select upper($1); $$
    language sql;
""")
sess.commit()


###
# CHECK IF UPDATE IS NECESSARY
###

sess.begin(subtransactions=True)

try:
    # Definition of the entity in the database 
    existing_definition = get_sql_function_def(sess, 'public', 'to_upper') 

    # Render render the local definition and execute in a transaction
    # Note that the function's body was changed
    sess.execute("""
        create or replace function to_upper(text)
        returns text
        as
            $$ select upper($1) || 'abc' ; $$
        language sql;
    """)

    # Get the new definition
    local_definition = get_sql_function_def(sess, 'public', 'to_upper') 

    # Check if the existing definition and local definition match
    requires_update: bool = existing_definition != local_definition

finally:
    sess.rollback()

print(requires_update) # True

It does require that the target database supports a read your own writes transaction isolation level

olirice avatar May 28 '20 18:05 olirice

I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.

only if there's transactional DDL available which is also not a given. Oracle for example does not support it.

zzzeek avatar May 28 '20 22:05 zzzeek

wrt to your comment

if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it

Did you mean that a company wouldn't be happy that it was happening, the end user likely wouldn't have the permissions required, or that there's something specific to DB2 that makes short lived schemas particularly dangerous?

olirice avatar May 28 '20 22:05 olirice