iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

[feature request] Support reading equality delete files

Open kevinjqliu opened this issue 1 year ago • 4 comments

Feature Request / Improvement

Only position delete is supported right now https://github.com/apache/iceberg-python/blob/e5a58b34dd830c6ffea11649613b693f70f7cbb4/pyiceberg/table/init.py#L1418

Let's also add reading equality delete

Position delete PR https://github.com/apache/iceberg/pull/6775

kevinjqliu avatar Sep 26 '24 19:09 kevinjqliu

Thanks @kevinjqliu, I can work on this issue

Zyiqin-Miranda avatar Sep 26 '24 19:09 Zyiqin-Miranda

This will be a fantastic addition to PyIceberg! Thank you for raising this issue @kevinjqliu and @Zyiqin-Miranda 🎉

sungwy avatar Sep 27 '24 02:09 sungwy

Thanks @kevinjqliu and @sungwy. Starting to add support to current plan_files function for equality deletes, not sure if the current _InclusiveMetricsEvaluator can be directly used to determine whether the equality delete files is relevant to the data files? Seems like Iceberg Java uses canContainEqDeletesForFile instead. My understanding is that position deletes can use lower_bound == upper_bound of file_path column to filter out irrelevant files quickly but equality deletes don't have this advantage, so basically equality deletes can be relevant to any data files within same partition. Thanks for any insights here in advance!

Zyiqin-Miranda avatar Sep 30 '24 05:09 Zyiqin-Miranda

Equality Delete Files and Scan Planning are good docs for this.

My general understanding is that equality deletes are applied to all data files (across all partitions, if partitioned).

Position delete files must be applied to data files from the same commit, when the data and delete file data sequence numbers are equal. This allows deleting rows that were added in the same commit.

kevinjqliu avatar Sep 30 '24 16:09 kevinjqliu

@Zyiqin-Miranda is there any progress on supporting equality deletes in pyiceberg ?

sfc-gh-mrojas avatar Feb 14 '25 20:02 sfc-gh-mrojas

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 14 '25 00:08 github-actions[bot]

@sfc-gh-mrojas https://github.com/apache/iceberg-python/pull/2255

francocalvo avatar Aug 14 '25 17:08 francocalvo