iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Support snapshot_properties in upsert operation

Open ksoullpwk opened this issue 2 months ago • 1 comments

Feature Request / Improvement

I see the other operations (like append, overwrite, delete, etc) has already supported snapshot_properties in arguments. I guess upsert operation should be able to pass this argument too since it internally calls append and overwrite which has supported.

ksoullpwk avatar Oct 24 '25 03:10 ksoullpwk

One question after looking into the code, should the upsert operation produce 1 snapshot or 2 snapshots?

I looked into the Spark integration test in iceberg repo and found that Spark will produce only 1 snapshot after running merge into. But from iceberg-python, the upsert operation might run both overwrite and append and produce 2 snapshots.

ksoullpwk avatar Oct 29 '25 05:10 ksoullpwk

Just ran into this as well. Seems upsert missing snapshot_properties was likely an oversight, shouldn't be too difficult to add.

Regarding upsert performance, yes I agree it's not ideal that it produces two snapshots. It's also quite slow currently for large tables or a large number of upsert rows. I think there's a separate ticket that touches on both of those issues: https://github.com/apache/iceberg-python/issues/2159. I'm considering implementing my own upsert operation using some of the lower-level APIs to get around the performance issues, as well as supporting upsert + delete in a single operation, which currently requires 2 separate operations and generates 3 snapshots.

greenlaw avatar Nov 25 '25 20:11 greenlaw