iceberg-python
iceberg-python copied to clipboard
Support getting a snapshot right before the given timestamp
Bring support to retrieve a snapshot before a particular timestamp, which is needed to perform Spark procedure like rollback_to_timestamp.
- [x] create ancestors_of() (relevant spark procedure)
- [x] create latest_snapshot_before_timestamp()
- [x] add tests
See comment in issue
Hello @chinmay-bhat,
I noticed that you are implementing the ancestors_of
method, and we have another pull request (#533) that is implementing the same behavior in another place as a function with a different output (Iterable[Snapshot]
instead of a List[tuple[int, int]]
) and signature (expects a Snapshot
instead of a Snapshot ID
).
I believe that we need to discuss and choose which one we want to have in the codebase.
cc/ @HonahX @Fokko @syun64
Hi @ndrluis thank you for flagging this! That PR went under my radar, and I'm excited to see a incremental scanning feature being implemented already on PyIceberg.
As for the question on the output type, I'm +1 for using Iterable[Snapshot] because I have a preference for using a class with set attributes than using a tuple.
Im also +1 for introducing the feature in this separate PR, since it's a much simpler feature in itself we can introduce quickly. WDYT?
Happy to update the output type to Iterable[Snapshot]! Also I really like how concise the ancestors_of function is in the other PR.
Thank you for the review @Fokko, @HonahX, @syun64 and @ndrluis ! 🚀
Merged! Thanks @chinmay-bhat for the great work! Thanks @Fokko @syun64 @ndrluis for the review and discussions!
Congrats on your first PR @chinmay-bhat !