iceberg
iceberg copied to clipboard
Core, API: Support scanning from refs
This change adds support for scanning from refs, where the scan can either be a time travel from a tag or a time travel from the tip of a branch.
@namrathamyske @rdblue @jackye1995 I've updated the PR based on our discussion on this thread . Let me know what you think! I still need to add tests.
Overall looks good to me, can we add some tests?
@jackye1995 Definitely, I'm still working on adding tests, wanted to get a discussion on the code in parallel.
Java CI build check and revapi check both fail with:
- What went wrong: Plugin [id: 'nebula.dependency-recommender', version: '11.0.0'] was not found in any of the following sources:
https://plugins.gradle.org/ is down for me (get a 1016 error from cloudflare) so most likely can't resolve any plugins. I'll retrigger the build when it becomes available again.
Updated with more tests @namrathamyske @jackye1995 @hililiwei @rdblue let me know what you think!
I should have time to look today. Thanks, @amogh-jahagirdar!
@rdblue I considered this when I changed the implementation to use snapshot timestamps instead of History entries. To me the timestamps on the history entries and snapshot log are 1:1. Just for my understanding, what are the cases where the timestamp on the history entry as a result of a new snapshot, is different than the timestamp on the snapshot itself? It seems like they would always be the same.
Edit: Okay I see in the code here: SnapshotLogEntry's timestamp would be different, if it wasn't considered as an "added" snapshot. But I'm still not sure what case that is tbh.
Discussed offline with @rdblue , I got an understanding of how history entries are different than the snapshot ancestor lineage, posting the information here so folks referring to the PR have context:
"History is the state of main at any given time. That can be changed, much like master in git can be changed. Ancestors can't be changed."
The current main can change when rollback is performed. The snapshot which gets resolved for a given time T should remain constant regardless of what is the current main ancestor lineage. History entries capture all the changes on the main table state and are what enable this ability to time travel to a snapshot which are outside the current table ancestors which is ultimately desired behavior. It also doesn't make sense to treat main as a separate case, and have a different semantic for non-main branches when it comes to time travel.
Maybe we could maintain separate history metadata for branches, but most likely it would need to be opt-in by users for specific branches and would need to not add too much metadata to manage.
So for now, it makes sense just to do time travel to a branch tip or time travel to a given tag. Updated the PR @rdblue @namrathamyske @jackye1995
Looks great. Thanks, @amogh-jahagirdar! I'll merge when tests are passing. I had to fix a conflict in revapi.
Thanks @rdblue @yzhaoqin @namrathamyske @hililiwei for the review!