iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

[bug] partial `OVERWRITE` operation writes the wrong snapshot summary metrics

Open kevinjqliu opened this issue 9 months ago • 1 comments

Apache Iceberg version

main (development)

Please describe the bug 🐞

Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.

update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/update/snapshot.py#L239 https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/snapshots.py#L358-L359

This is likely an oversight when we implemented partial write.

Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.

The only place where OVERWRITE operation is used is during partial deletes. https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/init.py#L678

Original thread https://github.com/apache/iceberg-go/pull/356#issuecomment-2746317666 (thanks @arnaudbriche and @zeroshade )

Partial overwrite reproduced in #1840

Willingness to contribute

  • [x] I can contribute a fix for this bug independently
  • [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • [ ] I cannot contribute a fix for this bug at this time

kevinjqliu avatar Mar 25 '25 01:03 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Nov 13 '25 00:11 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Nov 27 '25 00:11 github-actions[bot]