sqlmesh icon indicating copy to clipboard operation
sqlmesh copied to clipboard

Postgres: Add support for partitioned tables

Open mingfang opened this issue 10 months ago • 6 comments

Postgres natively supports partitioned tables

https://www.postgresql.org/docs/current/ddl-partitioning.html

mingfang avatar Apr 17 '24 20:04 mingfang

How large of a lift would it be to support partitioned tables? Is that something I could contribute to? I'm guessing some involved logic in the create stage, given you have to declare the partitions manually in PostgreSQL.

I have an incremental_by_time model with 1.7B rows, and maintaining the (necessary) indexes has caused things to slow dramatically. Partitioning by month, so at least the insert by day only has to reference one month's worth of data, would likely help tremendously.

@izeigerman

jrhorne avatar Sep 20 '24 14:09 jrhorne

It would be a new concept for SQLMesh, right now its concept of partitioning is "tell the underlying engine what columns to partition on and let it handle it transparently".

This would be active manual partitioning where SQLMesh itself has to create, drop and keep track of partitions. On the plus side, adding this for Postgres would also benefit Hive/Athena which has the same problem as Postgres

erindru avatar Sep 22 '24 20:09 erindru

Postgres supports Declarative Partitioning, as described in the link above, 5.11.2. Declarative Partitioning.

mingfang avatar Sep 23 '24 00:09 mingfang

Indeed it does, and even with Declarative Partitioning you still have to manually create the partitions (you just don't have to manually attach and detach them).

For example:

create table foo (id int, day date) partition by range (day);

insert into foo (id, day) values (1, '2023-01-01'::date);
--FAIL: ERROR: no partition of relation "foo" found for row.  Detail: Partition key of the failing row contains (day) = (2023-01-01).

create table foo_2023 partition of foo for values from ('2023-01-01') to ('2024-01-01');

insert into foo (id, day) values (1, '2023-01-01'::date); --succeeds

insert into foo (id, day) values (1, '2024-01-01'::date);
--FAIL: ERROR: no partition of relation "foo" found for row  Detail: Partition key of the failing row contains (day) = (2024-01-01).

Today, SQLMesh assumes that the DB will automatically create partitions when you start trying to insert data, which isnt the case for Postgres. So we would have to track partitions manually and issue the correct commands to create + drop them so that we can successfully insert data into the table.

erindru avatar Sep 23 '24 01:09 erindru

Would it simplify things to tie back to snapshot intervals? So forcing partitions to align (but contain possibly multiple of) with the interval unit?

Given the way SQLMesh also creates all the physical tables, then evaluates models, making it inherit from the parent table with pg_partman would be possible, right?

Then as snapshots are evaluated, it first checks if the physical table has a partition available for the interval.

I’m not sure if incremental by time and range based partitioning have to go together in SQLMesh, but to me, it seems simpler to implement if both are used in conjunction.

jrhorne avatar Sep 25 '24 02:09 jrhorne

Yes, for incremental models partitioned by time, the plan is to use RANGE-based partitioning. I'm debating whether to make it multiples of the interval unit specifically or copy what Iceberg did which is nice and simple (make year(time_col), month(time_col), day(time_col) or hour(time_col) the available options).

I'm also looking at how we can add LIST and HASH based partitioning because this lines up with partitioning strategies of other engines (identity and bucket in Iceberg).

I'm not planning to add a dependency on pg_partman at this stage. The SQLMesh Postgres adapter will be responsible for ensuring partitions exist on the target table prior to inserting data and should work out of the box on any Postgres instance without special configuration.

erindru avatar Sep 25 '24 04:09 erindru

is there any update related to this feature?

maulanaady avatar Jan 09 '25 06:01 maulanaady

It's still on our roadmap but we aren't actively working on it right now

erindru avatar Jan 09 '25 20:01 erindru