efcore.pg icon indicating copy to clipboard operation
efcore.pg copied to clipboard

Support for table partitioning

Open mrjrt opened this issue 5 years ago • 73 comments

I'm a member of a project that is going to be using code-first EFC with Postgres and need to utilise table partitioning. It would be great if the data connector supported this natively so we don't have to resort to hacking SQL into the migrations manually or switching to schema-first.

I've taken a quick gander at your codebase and have some ideas, but I suspect quite strongly I'll not get anywhere so thought I'd ask politely here as well as with the improvements in PG12 (i.e. foreign key support) more people may decide to utilise partitioning.

Thanks!

mrjrt avatar Sep 22 '19 19:09 mrjrt

PostgreSQL docs: https://www.postgresql.org/docs/current/ddl-partitioning.html

roji avatar Sep 23 '19 12:09 roji

@mrjrt this is definitely a good feature to have in the provider. If you want to give this a stab I can provide some guidance and pointers.

roji avatar Sep 23 '19 12:09 roji

@roji - We're looking to use Partitions for efficient deletes using EF Core. Do you have any guidance on either

  1. How this could be done using EF Core today
  2. If it cannot be, how we could help contribute to help make it possible?

jonmill avatar Dec 09 '19 21:12 jonmill

@jonmill using partitions today could be done by editing the generated migrations files for table creation, and replacing the appropriate lines with raw SQL that would create the partitions as desired (according to https://www.postgresql.org/docs/current/ddl-partitioning.html). I'm not familiar enough with PostgreSQL partitions to foresee particular problems or details, but in principle partitions should only ever affect table (and index) creation, and not querying or inserting. That means it's purely a migrations thing.

Contributing to this would mean designing model building extensions on EntityType, which would allow to model partitioning as supported by PostgreSQL. Users would be able to call C# extension methods when configuring their model, to define partitions for each entity etc. The Npgsql provider already exposes quite a few such PostgreSQL-specific extensions, although partitioning does seem to be slightly richer than the average case.

roji avatar Dec 09 '19 23:12 roji

@roji - Hmm manually editing the migrations file seems sketchy. I did some research and it doesn't look like you can alter a table to generate partitioning after it's created, so it looks like updating the EF extensions might be the way to go.

Any guidance on where to start looking in the source for ways to begin adding support for partitions?

jonmill avatar Dec 11 '19 02:12 jonmill

Hmm manually editing the migrations file seems sketchy.

In general, there's no problem at all with editing migration files and adding raw SQL (or replacing existing directives) - the EF Core team explicitly encourages this where it is necessary. A good example is database triggers; EF Core doesn't provide any way to model triggers and produce migrations for them, and there wouldn't be much value in it. So if you need them you can always edit migrations and add them there.

Adding support for partitions isn't going to be trivial. But if you want to go into it, take a look at the existing entity builder extensions, which are PostgreSQL model configuration extensions. These add metadata to entities (in the form of annotations), which are later picked up by the migrations SQL generator when it generates the CREATE TABLE DDL, for example. As I wrote above, the description of how partitions are specified/configured isn't trivial, so we'd have to think about how to represent everything in metadata etc.

roji avatar Dec 11 '19 10:12 roji

In general, there's no problem at all with editing migration files and adding raw SQL (or replacing existing directives) - the EF Core team explicitly encourages this where it is necessary. A good example is database triggers; EF Core doesn't provide any way to model triggers and produce migrations for them, and there wouldn't be much value in it. So if you need them you can always edit migrations and add them there.

Interesting, I didn't know that...thanks for clarifying that. So if we were to go this route since the addition of partitions to the code base sounds like it would be non-trivial and time-consuming, especially for someone not intimately familiar with the code base, would this involve completely replacing the CREATE TABLE call for the partitioned table? Or is there a way we can update the existing CREATE TABLE call to add our partitioning scheme?

jonmill avatar Dec 11 '19 17:12 jonmill

@roji - I found a potentially better way that's simpler; I use table inheritance to create the master table using EF Core, then create inherited tables as necessary using raw sql with a Rule to redirect inserts into the inherited table. The one problem I've found is inserting into the master table; EF Core throws a concurrency error since rows weren't actually inserted into the master table.

I'm guessing this is due to how postgres rules change the actual query...for instance

CREATE TABLE Inherited (
    CHECK (ColA = 'Foo')
) INHERITS (Master);

CREATE RULE Inherited_Insert AS
ON INSERT TO Master WHERE
(ColA = 'Foo')
DO INSTEAD
INSERT INTO Inhertied VALUES (NEW.*);

Inserting data into Master causes a concurrency exception since EF tries to get values back for tracking but the insert rule necessary for inherited tables doesn't return one.

Do you know of any way around this outside of writing manual SQL to insert?

jonmill avatar Jan 17 '20 08:01 jonmill

Hmm... If you examine the INSERTs sent by EF Core, you should see a RETURNING clause which instructs PostgreSQL to send back the generated columns in the new row. As long as you replicate the same clause in your INSTEAD rule, things should be transparent to EF Core and everything should work.

Hope I've understood your issue, note that there's an issue open on having EF Core support PostgreSQL table inheritance, although it's unlikely we'll get around to doing anything soon.

roji avatar Jan 17 '20 10:01 roji

I don't think a DO INSTEAD rule with a WHERE clause can have a RETURNING statement...I tried that and Postgres complained. Is there a way to ignore the RETURNING clause on insert?

jonmill avatar Jan 17 '20 17:01 jonmill

@jonmill no, EF Core really does require those keys in order to work correctly (in at least some scenarios). For example, if you add a new entity and then update it, EF Core needs to have the (generated) key from the earlier INSERT statement to properly generate the later UPDATE statement.

Note this paragraph in the PG docs which seems to indicate that RETURNING is supported in INSERT rules, but apparently on unconditional ones (e.g. no WHERE clause)? I'm guessing this is the problem you're running into?

In a rule for INSERT, UPDATE, or DELETE on a view, you can add a RETURNING clause that emits the view's columns. This clause will be used to compute the outputs if the rule is triggered by an INSERT RETURNING, UPDATE RETURNING, or DELETE RETURNING command respectively. When the rule is triggered by a command without RETURNING, the rule's RETURNING clause will be ignored. The current implementation allows only unconditional INSTEAD rules to contain RETURNING; furthermore there can be at most one RETURNING clause among all the rules for the same event. (This ensures that there is only one candidate RETURNING clause to be used to compute the results.) RETURNING queries on the view will be rejected if there is no RETURNING clause in any available rule.

roji avatar Jan 20 '20 10:01 roji

@roji - gotcha; yes, that is the issue I'm running into. I got around it by just doing a manual INSERT statement instead of using EF. The items I'm inserting won't be tracked or (immediately) used after insertion, so the RETURNING clause is extra work that's unnecessary for me.

It'd be great if this was possible without writing a manual INSERT clause, but I couldn't find a way

jonmill avatar Jan 21 '20 16:01 jonmill

The items I'm inserting won't be tracked or (immediately) used after insertion, so the RETURNING clause is extra work that's unnecessary for me.

Unfortunately EF Core doesn't know that you don't need tracking (and so don't need the RETURNING) - what you're asking for is https://github.com/dotnet/efcore/issues/9118. Right now I don't think there's a solution for what you're trying to do.

roji avatar Jan 23 '20 15:01 roji

Gotcha; thanks for all your help!

jonmill avatar Jan 23 '20 17:01 jonmill

Hi all! Have found some workaround for table partitioning using ef core and npgsql.

  1. Create shadow property e.g. "Partitioned" for entity
  2. Create BEFORE INSERT TRIGGER function that create partition (if not exists) and inserts row in this partition. And returns NEW row (it's important for ef core, otherwise there will be exception)
  3. But this returned row will be inserted in master table too.
  4. In BEFORE INSERT TRIGGER I've made next steps:
  • BEFORE inserting in partition set Partitioned column to true

NEW."Partitioned" = true;

  • BEFORE returning row set set Partitioned column to false

NEW."Partitioned" = false;

  1. Create AFTER INSERT TRIGGER function that deletes from master table rows by condition "Partitioned = false"

Hope it will help!

agabidullin avatar Apr 08 '20 16:04 agabidullin

Hi @roji, we got all the tables migration happening using EF Core. For enabling the partition by range, I'm not sure how I can change the Migration File? ( to achieve something like below )

CREATE TABLE people ( id int not null, birth_date date not null, country_code character(2) not null, name text ) PARTITION BY RANGE (birth_date);

reg child partition table, I can add by migrationBuilder.Sql(sql);. but not sure how I can create the master table with the partition definition using the EF Core Migration.

mathavanmani avatar Aug 04 '20 19:08 mathavanmani

@mathavanmani EF Core (and the PostgreSQL provider) doesn't currently support table partitioning. However, it's perfectly fine to generate migrations, and then edit the generated files and introduce raw SQL. So you can use EF Core migrations to generate a plain vanilla CreateTable operation, then edit the migration and replace that with a raw SQL operation that contains your PARTITION BY clauses etc. See these docs for more info on raw SQL in migrations.

roji avatar Aug 05 '20 16:08 roji

"returning xmin" does not work in partitioned tables in Postgres 12 (works ok in 11), so we can't use npgsql EF Core with partitioned tables... There is a discussion but I don't know if they are going to fix it: https://www.postgresql.org/message-id/[email protected]

freerider7777 avatar Sep 12 '20 18:09 freerider7777

Thanks for posting - the PG maintainers usually do fix regressions quite diligently, I'd be surprised if this isn't addressed. In any case, the EF Core PG provider doesn't do table partitioning on its own in any way.

roji avatar Sep 12 '20 22:09 roji

Hope they will... There is a strange discussion "whether this a bug", I pointed them npgsql ef core provider uses this construction https://www.postgresql.org/message-id/CA%2BTgmoaN0kQa3nxToHtD8nRymSnUYaj4XL3zKq8Q_0AUJcQNPQ%40mail.gmail.com

freerider7777 avatar Sep 13 '20 07:09 freerider7777

The required relational model infrastructure is done on the EF Core side, so this should no be doable.

/cc @AndriySvyryd

roji avatar Oct 28 '20 19:10 roji

12.5, 13.1 release of postgres, RETURNING xmin still not working for partitioned tables... any suggestions how to work around?

freerider7777 avatar Nov 13 '20 11:11 freerider7777

It's actual now, are you going to add PartitionBy for ef core provider? Sorry english is it not my native language.

Sin333 avatar Nov 17 '20 21:11 Sin333

It's important for my team. I can help you for this, I have custom realisation for 'FOR UPDATE' 'FOR SHARE', but I can't realisation 'Partition By' without change ef core provider for PostgreSQL

Sin333 avatar Nov 17 '20 21:11 Sin333

@Sin333 the intention is indeed to support this in the provider - there just hasn't been enough time (or user requests) up to now, and this isn't as trivial as some other features.

Some quick thoughts:

  • We'd basically exposed APIs to define declarative partitioning as shown in these PG docs.
  • We'd add metadata which allows migrations to generate the PARTITION BY RANGE (logdate) clause on CREATE TABLE.
  • We could also allow specifying the partitions themselves, although that seems a bit more complicated (not fully compatible with the general CREATE TABLE syntax, no columns), and may be of lower value (people can put raw SQL into migrations instead).

roji avatar Nov 18 '20 06:11 roji

@Sin333 I think you're talking about creation of tables, but there is a problem with using partitioned tables, when you use xmin as concurrency token (it does not work): [Timestamp] [DatabaseGenerated(DatabaseGeneratedOption.Computed)] [Column("xmin", TypeName = "xid")] public uint Rowversion { get; set; }

freerider7777 avatar Nov 18 '20 07:11 freerider7777

@freerider7777 even if there's a PostgreSQL limitation around the use of xmin with partitioned tables, that doesn't invalidate using partitioned tables in general (users don't necessarily always need xmin). I suggest we keep this issue about creation of partitioned tables.

roji avatar Nov 18 '20 07:11 roji

@roji Ok, just warning :)

freerider7777 avatar Nov 18 '20 08:11 freerider7777

@freerider7777 For use Partition By (timestamp) in PostgreSQL with ef core you can use pg extensions: 'timescaledb', 'pg_pathman', 'pg_partman', it's good and simple way.

But it's not working in my case. I need use Partition By with 'Enum', 'Guid' column

Sin333 avatar Nov 18 '20 08:11 Sin333

@Sin333 I wrote you details by email yesterday

freerider7777 avatar Nov 19 '20 08:11 freerider7777