Support partition evolution (old files having different partitoning schemes vs new files)
Is your feature request related to a problem? Please describe.
Currently Daft makes an assumption that all files being retrieved from a given Iceberg table has the same partitioning:
- Retrieve current partition spec from the table
- Translate any predicates into partition filters (e.g.
dt > 1970-02-01becomesday(dt) > 30) - Apply this partition filter naively to any ScanTasks
However, in certain cases, the partitioning of old data might differ from the current partitoning spec through the process of "partition evolution". For example, if the partitioning used to be month(dt) then the predicate from before should be correctly translated to day(dt) > 30 for new files, but month(dt) > 1 for old files.
See: #2084 for tests
@jaychia can you merge in the tests behind a pytest skip? I'll take a look after that!
@jaychia can you merge in the tests behind a pytest skip? I'll take a look after that!
Sounds good, pending merge: https://github.com/Eventual-Inc/Daft/pull/2084