alembic icon indicating copy to clipboard operation
alembic copied to clipboard

Support PostgreSQL partitioned tables

Open decaz opened this issue 6 years ago • 27 comments

I use PostgreSQL and if table is partitioned then autogeneration repeats creation of this table when it's already exist. Also Alembic generates deletion of all partitions of partitioned table.

Is it bug or there is some way to get around it?

decaz avatar Feb 18 '19 12:02 decaz

this question has no context. please share your env.py and table structures thanks.

zzzeek avatar Feb 18 '19 14:02 zzzeek

Installed dependencies:

sqlalchemy==1.2.18
alembic==1.0.7
psycopg2==2.7.7

Here is the table definition (test_partitioning/test.py):

from sqlalchemy import Column, Integer, MetaData, Table, func
from sqlalchemy.dialects.postgresql import TIMESTAMP

metadata = MetaData()

test = Table('test', metadata, 
    Column('id', Integer, primary_key=True),
    Column('created_at', TIMESTAMP(timezone=True), server_default=func.now(), primary_key=True),
    postgresql_partition_by='RANGE (created_at)',
)

Alembic environment configuration (migrations/env.py):

from test_partitioning.test import metadata
target_metadata = metadata

Alembic database configuration (alembic.ini):

sqlalchemy.url = postgresql+psycopg2://postgres@localhost

These are the only changes to the default alembic setup after $ alembic init migrations run.

Generation of the initial migration and upgrading of the database:

$ alembic revision --autogenerate --rev-id 0001
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.autogenerate.compare] Detected added table 'test'
  Generating /home/decaz/workspace/test-partitioning/migrations/versions/0001_.py ... done

$ alembic upgrade 0001
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 0001, empty message

$ alembic revision --autogenerate --rev-id 0002
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.autogenerate.compare] Detected added table 'test'
  Generating /home/decaz/workspace/test-partitioning/migrations/versions/0002_.py ... done

# Here was generated migration 0002 absolutely identical to the migration 0001

$ alembic upgrade 0002
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 0001 -> 0002, empty message
Traceback (most recent call last):
  File "/home/decaz/.virtualenvs/test-partitioning/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
    cursor, statement, parameters, context
  File "/home/decaz/.virtualenvs/test-partitioning/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
    cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "test" already exists
...

decaz avatar Feb 23 '19 16:02 decaz

After that I removed migration 0002 and created a couple of partitions manually:

postgres=# create table test_y2019m02d23 partition of test for values from ('2019-02-23') to ('2019-02-24');
CREATE TABLE
postgres=# create table test_default partition of test default;
CREATE TABLE

Then I've tried to generate new migration:

$ alembic revision --autogenerate --rev-id 0002
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.autogenerate.compare] Detected added table 'test'
INFO  [alembic.autogenerate.compare] Detected removed table 'test_y2019m02d23'
INFO  [alembic.autogenerate.compare] Detected removed table 'test_default'
  Generating /home/marat/workspace/test-partitioning/migrations/versions/0002_.py ... done

Here is the upgrade step that was generated:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('test',
    sa.Column('id', sa.Integer(), nullable=False),
    sa.Column('created_at', postgresql.TIMESTAMP(timezone=True), server_default=sa.text('now()'), nullable=False),
    sa.PrimaryKeyConstraint('id', 'created_at'),
    postgresql_partition_by='RANGE (created_at)'
    )
    op.drop_table('test_y2019m02d23')
    op.drop_table('test_default')
    # ### end Alembic commands ###

@zzzeek how can I tell alembic to not to recreate partitioned table and to not to drop existing partition tables?

decaz avatar Feb 23 '19 16:02 decaz

reflection for partitioned tables is not supported in 1.2, it's in 1.3: https://docs.sqlalchemy.org/en/latest/changelog/migration_13.html#added-basic-reflection-support-for-postgresql-partitioned-tables so try that? alembic logging should indicate this when it says "detected added table 'test'", indicates it is not seeing your table

zzzeek avatar Feb 23 '19 18:02 zzzeek

although support for the sub-partitions likely won't work right now, you will need to use include_object to filter those out. built in, fully tested support for partitioned tables would be a new feature add.

zzzeek avatar Feb 23 '19 18:02 zzzeek

mostly because there's no "create partition" schema element in SQLAlchemy that would correspond to anything. you say you are creating those tables manually at the command line so by definition alembic is going to say they are "removed" because they are not in your model.

zzzeek avatar Feb 23 '19 18:02 zzzeek

so....likely plan is, cookbook recipe telling folks to add an include_object recipe to filter out their partitioned sub-tables for the time being.

zzzeek avatar Feb 23 '19 18:02 zzzeek

@zzzeek thanks for help! Would it be possible to make alembic understand that table is sub-partition and automatically exclude it from generation (maybe as a future feature)? And one more question: 1.3 is in beta now - is there any ETA of release? Thanks!

decaz avatar Feb 24 '19 14:02 decaz

@zzzeek thanks for help! Would it be possible to make alembic understand that table is sub-partition and automatically exclude it from generation (maybe as a future feature)?

I try not to make partial design decisions that will get in the way of an eventual feature someday (edit: see my aug 2 2019 comment below, we do omit PG partial indexes already so there is precedent to omit table partitions at leas for now). Alembic's basic operation with autogenerate is, compare model to database, whatever is in database and not in model, add as a removal. the include_object hook is where this decisionmaking is altered to accommodate for exceptions to this. A real feature here would be that you are creating your "partition of" tables using Table or similar and that Alembic would see those in the model.

And one more question: 1.3 is in beta now - is there any ETA of release? Thanks!

I hope to push b3 into 1.3.0 final without any additional big changes so maybe this week barring any surprises.

zzzeek avatar Feb 25 '19 15:02 zzzeek

@zzzeek ok, understand you. Just tested 1.3.0b3 again with partitioned table and found that partitioned table doesn't included in new migration but indices of that table are included:

...
    Index('test_created_at_idx', 'created_at'),
    postgresql_partition_by='RANGE (created_at)',
)

Migration 0002:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_index('test_created_at_idx', 'test', ['created_at'], unique=False)
    # ### end Alembic commands ###


def downgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.drop_index('test_created_at_idx', table_name='test')
    # ### end Alembic commands ###

Upgrading of database:

$ alembic upgrade 0002
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.                                                          
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 0001 -> 0002, empty message
Traceback (most recent call last):
  File "/home/decaz/.virtualenvs/test-partitioning/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "/home/decaz/.virtualenvs/test-partitioning/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 546, in do_execute
    cursor.execute(statement, parameters)
psycopg2.ProgrammingError: relation "test_created_at_idx" already exists

decaz avatar Feb 25 '19 17:02 decaz

I guess it's because of the following: https://github.com/sqlalchemy/sqlalchemy/blob/53c7fc4690ead7a58947ad982c0a47e1af18e154/lib/sqlalchemy/dialects/postgresql/base.py#L3277-L3282

@zzzeek what is the reason for such behavior?

decaz avatar Feb 25 '19 18:02 decaz

reflection of PG partial indexes is not implemented, so they are skipped until they can be implemented.

zzzeek avatar Feb 25 '19 18:02 zzzeek

actually I am wrong, that line doesn't skip anything, it just emits a warning that the WHERE clause of an index is being ignored. it still reflects the index and includes an Index object.

zzzeek avatar Feb 25 '19 18:02 zzzeek

@zzzeek but I don't have any postgresql_where parameters specified.. So I should include all of partial indices in include_object hook algorithm also to make it work correctly?

decaz avatar Feb 25 '19 19:02 decaz

@zzzeek but I don't have any postgresql_where parameters specified..

I haven't tried any of this code so I'm not sure why you have referred to those lines in question in the first place. if you remove them, and then your code works, that would indicate they are the cause.

So I should include all of partial indices in include_object hook algorithm also to make it work correctly?

Whatever is being put into your alembic migration files that you don't want to be there, you should be able to filter out using include_object.

from my pov the thing you are trying to do should be supported. but it should be done with an earnest effort that plans ahead of time what the feature will look like and implements / tests it fully rather than poking little bits of code here and there. For Alembic, big new features are on hold pending contributors that are willing to work on them.

zzzeek avatar Feb 25 '19 19:02 zzzeek

@zzzeek you are right, PG partial indices are not related to the subject of issue, - my bad. Currently I've got to the following workaround for indices of partitioned tables (should be added after first migration with these indices was generated):

def include_object(obj, name, type_, reflected, compare_to):
    if type_ == 'index':
        try:
            # We have to appeal to the private attribute "_non_defaults"
            postgresql_partition_by = obj.table.dialect_options['postgresql']._non_defaults['partition_by']
        except (AttributeError, KeyError):
            postgresql_partition_by = None
        if postgresql_partition_by:
            return False
    return True

Does search of postgresql_partition_by look good or it can somehow be written easier?

decaz avatar Feb 25 '19 23:02 decaz

table.dialect_options["postgresql"] returns a dictionary with the options in it:

table.dialect_options["postgresql"]["partition_by"]

zzzeek avatar Feb 27 '19 16:02 zzzeek

or you might want to use get:

table.dialect_options["postgresql"].get("partition_by", None)

zzzeek avatar Feb 27 '19 16:02 zzzeek

@zzzeek I guess my workaround with include_object for PG partial indices isn't necessary with sqlalchemy>=1.3.6 due to https://github.com/sqlalchemy/sqlalchemy/issues/4771 ?

decaz avatar Jul 26 '19 12:07 decaz

I don't work with PG partitioned tables and I don't really follow the issue you're having. What happens if you just try it? Is your initial example 5 months ago enough to test it?

zzzeek avatar Jul 26 '19 23:07 zzzeek

As far as the issue with indices, that is. I would think that the newer behavior makes things worse because it is reporting on indexes in the database that aren't in your metadata since PG creates these implicitly.

zzzeek avatar Jul 26 '19 23:07 zzzeek

@zzzeek it works for me now without workaround at include_object. But I still need to exclude partitioned tables from migrations within include_object (subject of the issue).

decaz avatar Jul 29 '19 14:07 decaz

so to sum up:

  1. the original partitioned table in your MetaData, as long as you are on SQLAlchemy 1.3, works, e.g. alembic generates initial revision, then from there it's fine, subsequent revisions do not try to re-generate this table

  2. the indexes you set up on that base table also work, e.g. subsequent revisions do not try to regenerate the base index

  3. when you later go into the database and manually emit "CREATE TABLE .. PARTITION OF ..", those table names come out in your autogenerate as "DROP TABLE", because your model.

  4. there should be indexes created by PG on the partitions also that come out as "DROP INDEX".

can you confirm 1,2,3,4 above please thanks

zzzeek avatar Jul 29 '19 14:07 zzzeek

@zzzeek

  1. Yes
  2. Yes
  3. Yes
  4. Yes

Currently 3 and 4 can be worked around by include_object recipe (filtering out partitioned tables by their names).

decaz avatar Aug 02 '19 12:08 decaz

so there's a point to be made that we do ignore some indexes in the "compare" step already and this is in the PG dialect, eg. so-called "functional" indexes (really partial indexes), it warns, and that's what happens. that occurs before the indexes are passed to the include_object hook. so there might be precedent to add an ignore for sub-partitions rather than saying to use a recipe.

zzzeek avatar Aug 02 '19 13:08 zzzeek

I am also trying to achieve a similar goal. @decaz could you please include a full recipe of your env.py?

finete avatar Apr 06 '20 20:04 finete

I am also trying to achieve a similar goal. @decaz could you please include a full recipe of your env.py?

As shown https://gist.github.com/utek/6163250

env.py

def get_excludes_from_config(config_, type_="tables"):
    excludes = config_.get(type_, None)
    if excludes is not None:
        excludes = excludes.split(",")
    return excludes


excluded_tables = get_excludes_from_config(config.get_section('exclude'), "tables")
excluded_indices = get_excludes_from_config(config.get_section('exclude'), "indices")


def include_object(obj, name, type_, reflected, compare_to):
    if type_ == "table":
        for table_pat in excluded_tables:
            if re.match(table_pat, name):
                return False
        return True

    elif type_ == "index":
        for index_pat in excluded_indices:
            if re.match(index_pat, name):
                return False
        return True

    else:
        return True

alembic.ini

[exclude]
tables = test_y\d{4}m\d{2}d\d{2}
indices = test_y\d{4}m\d{2}d\d{2}_created_at_idx

shabykov avatar Mar 26 '21 13:03 shabykov