alembic
alembic copied to clipboard
support CHECK constraints for autogenerate
Migrated issue, originally created by Alex Rothberg
If I start with the following and then uncomment the CheckConstraint, alembic does not handle the migration adding it or even print a warning. If I just have alembic add the entire class the migration to add the class DOES have the CheckConstraint.
class TitleDepartmentYear(db.Model):
id = db.Column(UUID, default=uuid.uuid4, primary_key=True)
name = db.Column(db.String(), nullable=False)
__table_args__ = (
# db.CheckConstraint(
# sqa.func.char_length(name) > 0,
# name="title_department_year_name_min_length",
# ),
)
Michael Bayer (@zzzeek) wrote:
note this is a documented limitation:
http://alembic.zzzcomputing.com/en/latest/autogenerate.html#what-does-autogenerate-detect-and-what-does-it-not-detect
Alex Rothberg wrote:
I started taking a stab at this. Does it looks like I am on the right track? Is there some missing feature in sqla, etc that will lead to a dead end here?
@comparators.dispatch_for("table")
def compare_table_level(autogen_context, modify_ops,
schemaname, tablename, conn_table, metadata_table):
metadata_check_constraints = set(
ck for ck in metadata_table.constraints
if isinstance(ck, sa_schema.CheckConstraint)
)
conn_check_constraints = set(
ck for ck in conn_table.constraints
if isinstance(ck, sa_schema.CheckConstraint)
)
for metadata_check_constraint in metadata_check_constraints:
# barfs on something about quoting:
metadata_check_constraint_name = sqla_compat._get_index_final_name(autogen_context.dialect, metadata_check_constraint)
# i'm lazy; clearly this is not efficient:
if metadata_check_constraint_name not in (conn_check_constraint.name for conn_check_constraint in conn_check_constraints):
modify_ops.ops.append(ops.CreateCheckConstraintOp.from_constraint(metadata_check_constraint))
# needed to do this since I am writing this as a plugin and there is already a method registered:
del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]
@renderers.dispatch_for(ops.CreateCheckConstraintOp)
def _add_check_constraint(autogen_context, constraint):
# not quite right as I need to sub in query params:
args = ['"%s"' % constraint.constraint_name, '"%s"' % constraint.table_name, '"%s"' % constraint.condition]
return "%(prefix)screate_check_constraint(%(args)s)" % {
'prefix': _alembic_autogenerate_prefix(autogen_context),
'args': ", ".join(args)
}
Michael Bayer (@zzzeek) wrote:
that's a fragment of a beginning, sure, it would need to be in the context of the source code in compare.py to really be reviewable.
With this feature, the much bigger job is writing out the tests, which is because the constraint autogen use case has an enormous number of edge cases that are also showstoppers if they are not working correctly, because within autogen, a failed comparison means false positives. A lot of applications use Alembic's comparison logic to generate schema diffs which need to come up clean if nothing has changed, so any area where a backend either doesn't provide CHECK constraint reflection, or the content of the constraint doesn't exactly match what is in the Python model, show-stopping bug. Additionally, SQLAlchemy has datatypes Enum and Boolean which automatically generate CHECK constraints and I'm not sure of how we will will handle accommodating for this, detecting which CHECK constraints need to be explicit and which do not.
When the compare indexes and uniques feature was introduced, I spent literally years fixing bugs of this nature, which is why you will see a very large number of tests in https://github.com/zzzeek/alembic/blob/master/tests/test_autogen_indexes.py. I just did a grep, found 31 distinct issues involving unique and index autogenerate in changelog.rst, see https://gist.github.com/zzzeek/530634270e5631b973fa1d70885aba5a.
So for this feature, writing a new test suite test_autogen_check.py is a big part of getting this to work. If the "CHECK" constraint autogen feature is added as an optional feature that's enabled in the environment, that would make it easier to release without as much risk of regressions.
Alex Rothberg wrote:
Fair. In the meantime I might start working on this as a plugin with the plan to get it mature enough that it can be submitted for addition to alembic proper.
I was wondering if there is a work around to:
del renderers._registry[(ops.CreateCheckConstraintOp, 'default')]
and what is the best way to get the "real" name for the check constraint? (ie sqla_compat._get_index_final_name doesn't seem to work for check constraints).
Michael Bayer (@zzzeek) wrote:
The rendering of a CHECK constraint object is separate from the detection which is the hard part, you can submit a rendering PR right now. additionally, the renderers.dispatch_for should be able to replace the previous entry, if it doesn't do that right now I can take a PR for that too.
Alex Rothberg wrote:
https://bitbucket.org/zzzeek/alembic/src/37ef35fbe673d34b5f8d1a0697cf01d10b0abe05/alembic/util/langhelpers.py#lines-285
Changes by Michael Bayer (@zzzeek):
- changed title from "CheckConstraint not detected" to "support CHECK constraints for autogenerate"
There would be one way to detect the changed check constraint, that would be quite easy to implement and would work in the most environments: As far as I can see, the problem is, that for example in postgres 'consrc' is not available anymore. But instead, you could add a further alembic table or extend schema_version to hold the source of the check constraint. Then you can just compare the source in the database to the one of the SQLAlchemy metadata and update (drop and recreate) it when it changes. I'm currently doing that in a project. However, it would require the developers/admins not to mess with that table, so I think this should be a optional behaviour.
I think you have a good idea there, however that's the beginning of a django-like system that is going to serialize the entire model that's being built so that it doesnt even have to use database reflection for autogenerate. I'd rather someone implement the real thing rather than a tiny piece of it for some specific use case.
I totally see that point.
I looked a bit further into this, for postgres it should be easy to implement in the 'alembic' way:
select pgc.conname as constraint_name,
ccu.table_schema as table_schema,
ccu.table_name,
ccu.column_name,
pg_get_constraintdef(pgc.oid)
from pg_constraint pgc
join pg_namespace nsp on nsp.oid = pgc.connamespace
join pg_class cls on pgc.conrelid = cls.oid
left join information_schema.constraint_column_usage ccu
on pgc.conname = ccu.constraint_name
and nsp.nspname = ccu.constraint_schema
where pgc.contype ='c'
order by pgc.conname;
gives the check constraint definitions associated with tables and columns.
I don't know about the internals of alembic. Could you tell me where to dig into it, if I wanted to implement it (for postgres)?
And for the comparison there is also an easy approach if you use an external library (pglast):
from pglast import parse_sql, _remove_stmt_len_and_location
statement_1 = "check article_number is null or company_id is not null" # what I wrote in SQLAlchemy
statement_2 = "CHECK (((article_number IS NULL) OR (company_id IS NOT NULL)))" # what I got from the query above
ast1 = parse_sql("SELECT " + statement_1[6:])
ast2 = parse_sql("SELECT " + statement_2[6:])
_remove_stmt_len_and_location(ast1)
_remove_stmt_len_and_location(ast2)
assert ast1 == ast2
sqlalchemy does reflect the constraint definition already, it's just that it doesn't necessarily match letter for letter, so if the pglast thing is useful that may be an option. seems like it is only for postgresql ?
Yes, unfortunately it only targets postgres. But I guess that:
- one could find AST parsers for the other dialects, at least in C
- pglast is compatible to most statements that would be used in a check constraint. And otherwise it would raise an exception and let the user implement it manually
oh @lelit wrote this. friend of the show. We can try sqlparse here as well, and pglast optional plugin?
I looked into sqlparse for some minutes and it does not build a proper AST, so I'm sceptical that it would work. @lelit started pglast because of the problems with sqlparse as far as I can see from the README, so I think a postgres dedicated solution (probably as a plugin) that works for other dialects in most cases (as long as the queries are easily accessible) would be the way I would personally take.
well if it works it's a start and maybe @lelit wants to expand the scope of it :)
I just quickly read the above, but sure, I'm willing to expand it's scope, if I can... obviously once I have a better idea of what you mean with that :smile: Feel free to ask/explain/suggest anything.
Hi @lelit,
what I have seen so far from pglast looks great, thanks for that!
@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.
Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.
I spent considerable time trying to use sqlparse for my need, but although it does quite a good job considering it handles a wide spectrum of SQL syntaxes, I didn't find it "precise" enough, and as said by Adrian the internal model its "parser" builds is not malleable enough. OTOH, yes, pglast is very PG specific: the underlying libpgquery is basically the very same parser used by PG itself, so it has access to everything comes out from that.
@zzzeek is talking about extending the scope from postgres to other dialects. I guess that's quite a big task, but it would solve the check constraint issue completely if it works.
No, that's definitely not doable with (current) pglast: one would need to implement the equivalent of libpgquery for other dialects, and once there, implement a new set of "printers" (in pglast parlance, a function that takes an AST and serialize it into a corresponding SQL statement). While I can easily do the latter, I'm afraid that "wrapping" other dialects parser into something usable is beyond my current cycles availability, sorry.
Do you see any problems ahead (for postgres only)? The task is to compare SQL statements to see if they are identical. The few examples I tried worked flawlessly.
No, I would say that's totally doable: pglast already supports a lot of PG syntax, and expanding it to cover what is currently missing is quite easy.
alembic_utils handles the "comparing sql stored in strings" problem by rendering the entity in a dummy schema, and comparing the reflected definition from the dummy schema to the reflected definition in the target schema.
It avoids having to parse every dialect of sql and would work on any database that supports constraint inspection. It is a kinda slow for larger projects but I'm not aware of a use case where autogenerating migrations is done in a tight loop :)
Ahh, that's smart 👌🏼
that is an interesting idea.
we sort of do a bit of that for datatypes now, rather than comparing each element of a reflected datatype with what the model has we compare the rendered strings. but that doesn't include adding an extra round trip to the DB. the "dummy schema" part makes me a little nervous. if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it.
I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.
The example below tests if the local definition of a function matches the database's definition.
###
# SETUP
###
def get_sql_function_def(sess, schema, name):
"""Look up the SQL definition of a function from a database"""
return sess.execute("""
select
pg_get_functiondef(p.oid)
from
pg_proc p
join pg_namespace n
on p.pronamespace = n.oid
where
n.nspname = 'public'
and p.proname = 'to_upper'
limit 1
""").fetchone()[0]
# Create an "existing" function
sess.execute("""
create or replace function to_upper(text)
returns text
as
$$ select upper($1); $$
language sql;
""")
sess.commit()
###
# CHECK IF UPDATE IS NECESSARY
###
sess.begin(subtransactions=True)
try:
# Definition of the entity in the database
existing_definition = get_sql_function_def(sess, 'public', 'to_upper')
# Render render the local definition and execute in a transaction
# Note that the function's body was changed
sess.execute("""
create or replace function to_upper(text)
returns text
as
$$ select upper($1) || 'abc' ; $$
language sql;
""")
# Get the new definition
local_definition = get_sql_function_def(sess, 'public', 'to_upper')
# Check if the existing definition and local definition match
requires_update: bool = existing_definition != local_definition
finally:
sess.rollback()
print(requires_update) # True
It does require that the target database supports a read your own writes transaction isolation level
I'm not familiar with DB2, but if the dummy schema is the only part that makes you squeamish, a rolled back transaction would work too.
only if there's transactional DDL available which is also not a given. Oracle for example does not support it.
wrt to your comment
if you are running on your company's giant DB2 staging system it's likely not that simple to create a new schema on the fly and then drop it
Did you mean that a company wouldn't be happy that it was happening, the end user likely wouldn't have the permissions required, or that there's something specific to DB2 that makes short lived schemas particularly dangerous?