pgstac icon indicating copy to clipboard operation
pgstac copied to clipboard

Setting up time partitioning after initial data load

Open drnextgis opened this issue 2 years ago • 1 comments

The documentation states that Partitioning needs to be set up before data is loaded. I'm curious if it's possible to set up time partitioning after the initial data load. I've noticed that PostgreSQL provides the ATTACH PARTITION sub-command, which has led me to consider the following steps:

  1. Generate tables (partitions) for a specific collection and transfer the corresponding records from _items_1.
  2. Remove items from the _items_1 partition.
  3. Utilize the ATTACH command to link the partitions created in the first step to items.
  4. Define partition_trunc for the collection.

Is there something crucial that I might be overlooking?

drnextgis avatar Sep 30 '23 05:09 drnextgis

@drnextgis it looks like that section of the documentation needs to be updated. There are now triggers on the collections table that will rewrite a full collection into new partitions if the partition strategy is changed.

https://github.com/stac-utils/pgstac/blob/634122f7736da84c4168c2512498766f87c96ad2/src/pgstac/sql/003b_partitions.sql#L455

I will note that on a large collection, this could take a very long time and take up some aggressive locks! So as much as possible, you should still try to set the partition strategy before loading any data.

bitner avatar Oct 05 '23 14:10 bitner