iceberg-python
iceberg-python copied to clipboard
Support creating tags by adding `set_ref_snapshot` API
Closes #573
@syun64 thank you for the review!
In regards to testing, I see that the tests are all performed on the classes that are called by the APIs. None of the tests are directly on the APIs.
For example, there's already a test test_update_metadata_set_snapshot_ref
that calls SetSnapshotRefUpdate
which is also the class that we would call to test set_ref_snapshot
.
Do I still need to create unit tests for set_ref_snapshot
API in Transaction
and Table
?
Yes - even if its small, I think it would still be good to have a unit test that verifies the behavior of the proposed table and transaction API
There are some tests in https://github.com/apache/iceberg-python/blob/main/tests/table/test_init.py#L592 that should serve as good examples of API unit tests
I've added the tests based on your suggestions, please review again whenever possible :)
Can anyone with write access review?
@HonahX @Fokko
Thanks @chinmay-bhat for taking this and @syun64 for reviewing! Linking my comment to here, I actually prefer to make this an internal method behind other APIs. However, given that set_ref_snapshot
and add_snapshot
exists in 0.6.x, we may still wants to add them back and add deprecation notice on them: https://github.com/apache/iceberg-python/blob/4ed5a144ecad69878cf47292557b17df31ffe5a8/pyiceberg/utils/deprecated.py#L22
Hi @HonahX , I've updated the PR based on your suggestions. I've made set_ref_snapshot()
protected and created an inner class that takes care of snapshot management operations. In this PR, the operations include create_tag
and create_branch
.
While testing, I realised that we(PyIceberg) only support the main branch, ie. while using create_tag
or create_branch
, if the branch/tag is "main", the snapshot_log and current_snapshot_id are updated, otherwise they don't reflect any changes. I'm assuming this is because we always display the main branch snapshot_log.
I know the create_tag functions as expected because the metadata.refs[<branch_not_main>] show the expected snapshot after running create_tag. The test function is basicaly doing the above check to verify if it works.
Am I missing something here, or do we need to create a PR to get the snapshot_log, current_snapshot and other details for every branch/tag?
That said, since this PR holds the ManageSnapshots
grouping which other PRs depend on, I would like to focus on getting this and the other PRs merged before exploring the branch/tag history problem.
Thank you Honah for the review! :)
I've updated the tests!
Hi @chinmay-bhat this looks almost ready to merge - I responded to the thread above to continue the discussion on the TableRequirements.
In the mean time, could we add a new section to the mkdocs/api.md file named "Snapshot Management" and port over the examples you already have in your docstrings so we can have it on the official docsite?
And let's update the name of this PR as well to reflect its current state
Removed AssertTableUUID from PR and rebased onto latest main!
@Fokko @HonahX I believe all suggestions have now been added to the PR. Can we re-trigger the tests?
@Fokko @HonahX Can we merge this in? Other draft PRs depend on this one 🙌
Merged! Thanks @chinmay-bhat for the great work. Thanks @Fokko @syun64 for the review.
Thanks @HonahX @Fokko @syun64 for the discussions and reviews! 🫡 🙌
This is awesome! @chinmay-bhat You are on a roll with these PRs right now, thank you for contributing these much needed features to PyIceberg! Looking forward to releasing these with the upcoming 0.7.0 release