iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Support creating tags by adding `set_ref_snapshot` API

Open chinmay-bhat opened this issue 9 months ago • 5 comments

Closes #573

chinmay-bhat avatar May 12 '24 16:05 chinmay-bhat

@syun64 thank you for the review! In regards to testing, I see that the tests are all performed on the classes that are called by the APIs. None of the tests are directly on the APIs. For example, there's already a test test_update_metadata_set_snapshot_ref that calls SetSnapshotRefUpdate which is also the class that we would call to test set_ref_snapshot.

Do I still need to create unit tests for set_ref_snapshot API in Transaction and Table?

chinmay-bhat avatar May 13 '24 02:05 chinmay-bhat

Yes - even if its small, I think it would still be good to have a unit test that verifies the behavior of the proposed table and transaction API

There are some tests in https://github.com/apache/iceberg-python/blob/main/tests/table/test_init.py#L592 that should serve as good examples of API unit tests

sungwy avatar May 13 '24 14:05 sungwy

I've added the tests based on your suggestions, please review again whenever possible :)

chinmay-bhat avatar May 14 '24 14:05 chinmay-bhat

Can anyone with write access review?

chinmay-bhat avatar May 15 '24 16:05 chinmay-bhat

@HonahX @Fokko

sungwy avatar May 15 '24 17:05 sungwy

Thanks @chinmay-bhat for taking this and @syun64 for reviewing! Linking my comment to here, I actually prefer to make this an internal method behind other APIs. However, given that set_ref_snapshot and add_snapshot exists in 0.6.x, we may still wants to add them back and add deprecation notice on them: https://github.com/apache/iceberg-python/blob/4ed5a144ecad69878cf47292557b17df31ffe5a8/pyiceberg/utils/deprecated.py#L22

HonahX avatar May 22 '24 02:05 HonahX

Hi @HonahX , I've updated the PR based on your suggestions. I've made set_ref_snapshot() protected and created an inner class that takes care of snapshot management operations. In this PR, the operations include create_tag and create_branch.

While testing, I realised that we(PyIceberg) only support the main branch, ie. while using create_tag or create_branch, if the branch/tag is "main", the snapshot_log and current_snapshot_id are updated, otherwise they don't reflect any changes. I'm assuming this is because we always display the main branch snapshot_log. I know the create_tag functions as expected because the metadata.refs[<branch_not_main>] show the expected snapshot after running create_tag. The test function is basicaly doing the above check to verify if it works.

Am I missing something here, or do we need to create a PR to get the snapshot_log, current_snapshot and other details for every branch/tag?

That said, since this PR holds the ManageSnapshots grouping which other PRs depend on, I would like to focus on getting this and the other PRs merged before exploring the branch/tag history problem.

chinmay-bhat avatar May 22 '24 12:05 chinmay-bhat

Thank you Honah for the review! :)

I've updated the tests!

chinmay-bhat avatar May 31 '24 09:05 chinmay-bhat

Hi @chinmay-bhat this looks almost ready to merge - I responded to the thread above to continue the discussion on the TableRequirements.

In the mean time, could we add a new section to the mkdocs/api.md file named "Snapshot Management" and port over the examples you already have in your docstrings so we can have it on the official docsite?

sungwy avatar Jun 09 '24 13:06 sungwy

And let's update the name of this PR as well to reflect its current state

sungwy avatar Jun 09 '24 13:06 sungwy

Removed AssertTableUUID from PR and rebased onto latest main!

chinmay-bhat avatar Jun 11 '24 02:06 chinmay-bhat

@Fokko @HonahX I believe all suggestions have now been added to the PR. Can we re-trigger the tests?

chinmay-bhat avatar Jun 11 '24 15:06 chinmay-bhat

@Fokko @HonahX Can we merge this in? Other draft PRs depend on this one 🙌

chinmay-bhat avatar Jun 15 '24 13:06 chinmay-bhat

Merged! Thanks @chinmay-bhat for the great work. Thanks @Fokko @syun64 for the review.

HonahX avatar Jun 15 '24 16:06 HonahX

Thanks @HonahX @Fokko @syun64 for the discussions and reviews! 🫡 🙌

chinmay-bhat avatar Jun 15 '24 16:06 chinmay-bhat

This is awesome! @chinmay-bhat You are on a roll with these PRs right now, thank you for contributing these much needed features to PyIceberg! Looking forward to releasing these with the upcoming 0.7.0 release

sungwy avatar Jun 15 '24 22:06 sungwy