Added ExpireSnapshots Feature
Summary
This PR Closes issue #516 by implementing support for the ExpireSnapshot table metadata action.
Rationale
The ExpireSnapshot action is a core part of Iceberg’s table maintenance APIs. Adding support for this action in PyIceberg helps ensure feature parity with other language implementations (e.g., Java) and supports users who want to programmatically manage snapshot retention using PyIceberg’s public API.
Testing
- Unit tests have been added to cover the initial expected usage paths.
- Additional feedback on edge cases, missing test scenarios or corrections to the setup test logic is greatly welcome during the review process.
User-facing changes
- This change introduces a new public API:
ExpireSnapshot. - No breaking changes or modifications to existing APIs were made.
After looking at the way the action here was implemented, I refined the changes. Let me know if these make sense :)
@ForeverAngry Could you see if you can get the linters/tests passing? Thanks!
@ForeverAngry Could you see if you can get the linters/tests passing? Thanks!
I was able to get the linting figured out. I dont have docker on this machine, so im going to give the integration tests a try in codespaces to see if i can get away doing the checks there.
@ForeverAngry Sorry for the late reply, it looks like that there is a test failing now 👀
@ForeverAngry Sorry for the late reply, it looks like that there is a test failing now 👀
I think this commit should fix the test error, i also added additional tests. All passed - and appear to be in-good-order. 🤞 this time is the charm.
this would be great - whats the status of this?
I see that in https://github.com/apache/iceberg-python/pull/1958, for orphaned file removal, we decided to have a table.maintenance API returning a MaintenanceTable. As a user, if I have to do orphaned file removal via that table.maintenance API, but snapshot expiration instead via table.expire_snapshots, I might be confused (they both feel like table maintenance to me, though snapshot expiration is admittedly a bit stronger). Curious about people's thoughts here.
Well, right now this pr doesn't do anything with the newly orphaned files. It just handles the metadata operation.
I see that in https://github.com/apache/iceberg-python/pull/1958, for orphaned file removal, we decided to have a table.maintenance API returning a MaintenanceTable. As a user, if I have to do orphaned file removal via that table.maintenance API, but snapshot expiration instead via table.expire_snapshots, I might be confused (they both feel like table maintenance to me, though snapshot expiration is admittedly a bit stronger). Curious about people's thoughts here.
Yes, I agree with you there. Before doing the 0.10.0 release, we need to ensure we align on this and make proper docs. I have a slight preference towards .mainenance to have a clear distinction between maintenance and the regular operations (such as creating a tag or branch).
I'm happy to follow the '.maintenance' api design if there is a strong preference toward it.
Thanks @ForeverAngry for working on this, and I think it is ready to go 👍
Great! I think @kevinjqliu is still listed as needing approval. @kevinjqliu can you put your stamp on this as well?
@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:
- Add a new Maintenance doc section with a subsection that explains the expire snapshots operation.
- Move the expire snapshots operation under maintenance:
tbl.maintenance.expire_snapshots()
Thanks again @ForeverAngry for working on this 🚀
Thanks again @ForeverAngry for working on this 🚀
Thank you, for being such a supportive and inspiring member to work with!
@ForeverAngry I think we can move this one forward. Before the release, we need to follow up on two things:
- Add a new Maintenance doc section with a subsection that explains the expire snapshots operation.
- Move the expire snapshots operation under maintenance:
tbl.maintenance.expire_snapshots()
I'll work on this, this weekend!
@ForeverAngry Appreciate that, thanks! 🙌
@ForeverAngry Thank you for this feature ❤️
Just one question/comment: It seems this only supports expiration time/age, and does not support other retention policies. For example, the Java API's ExpireSnapshots supports retainLast, and ManageSnapshots supports setMinSnapshotsToKeep. Any plans to add support for these features, by chance?
@ForeverAngry Thank you for this feature ❤️
Just one question/comment: It seems this only supports expiration time/age, and does not support other retention policies. For example, the Java API's ExpireSnapshots supports retainLast, and ManageSnapshots supports setMinSnapshotsToKeep. Any plans to add support for these features, by chance?
Yeah, those slipped my mind when I originally did it. I'd be happy to implement those. :)