pg_analytics icon indicating copy to clipboard operation
pg_analytics copied to clipboard

Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE

Open pdpark opened this issue 6 months ago • 4 comments

What feature are you requesting?

The ability to specify a custom partitioning scheme through the use of a pattern in the files option when creating foreign tables, like this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_{id_2}.parquet')

Why are you requesting this feature?

To support existing custom partitioning scheme.

What is your proposed implementation for this feature?

Foreign tables could be created like this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_{id_2}.parquet')

...or this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_*.parquet')

The values in brackets must correspond with column names defined in the referenced parquet files or the statement will fail.

When running a query like this on the first table defined above:

select *
from my_table
where id_1 = '1234'
and id_2 = '0987'

...the id_1 and id_2 column values from the sql where clause will be substituted into the files pattern producing a string that must correspond with an actual parquet file at the specified s3 location:

s3://bucket/data_1234_0987.parquet

A query on the second table table defined above:

select *
from my_table
where id_1 = '1234'

...will produce a files pattern after substitution that looks like this:

s3://bucket/data_1234_*.parquet

Full Name:

Patrick Park

Affiliation:

Payzer

pdpark avatar Aug 14 '24 21:08 pdpark