iceberg-python Support data files compaction

Introduce an API to compact data files. The first version of the API will do the following:

take a predicate expression as input parameter to find data files matching the filter that will be re-written
group data files by partitions and rewrite them using the same bin-packing constraints of the writer

Aug 22 '24 18:08 sungwy

Unassigning to work on other near-term priorities

Sep 24 '24 17:09 sungwy

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

Mar 24 '25 00:03 github-actions[bot]

Is there any way to trigger compaction? The literature says that it's optimal to compact delete files back into data files to improve read space, and AFAICT there's no way to do this in PyIceberg.

Incidentally, is there a way to control whether your catalog uses copy-on-write vs. merge-on-read?

Jun 01 '25 05:06 zbs

@sungwy

Since my task https://github.com/apache/iceberg-python/issues/1931#issuecomment-3002159502 is depending on the DeleteFileIndex so I am not going to work on it for now until the DeleteFileIndex task is complete.

At the mean time, wondering if I can take this task if you haven't started working on this? Thanks!

Jun 26 '25 21:06 yingjianwu98