[bug] partial `OVERWRITE` operation writes the wrong snapshot summary metrics
Apache Iceberg version
main (development)
Please describe the bug 🐞
Snapshot OVERWRITE operation can calculate the wrong summary fields when the table is partially updated.
update_snapshot_summaries assumes that all OVERWRITE operations are full table overwrite
https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/update/snapshot.py#L239
https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/snapshots.py#L358-L359
This is likely an oversight when we implemented partial write.
Thankfully the table/transaction's overwrite function is currently implemented as a delete+append.
The only place where OVERWRITE operation is used is during partial deletes. https://github.com/apache/iceberg-python/blob/322ebdd1a6e4870e7f0bdbdf74ca2a04b0ce5d7f/pyiceberg/table/init.py#L678
Original thread https://github.com/apache/iceberg-go/pull/356#issuecomment-2746317666 (thanks @arnaudbriche and @zeroshade )
Partial overwrite reproduced in #1840
Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'