ibis
ibis copied to clipboard
feat(flink): Support temporal join on Iceberg table
Is your feature request related to a problem?
Temporal join against Iceberg tables would be a "killer feature" [@zhenzhongxu] for training dataset generation using Ibis.
Iceberg tables support time travel. Temporal join together with time travel would enable a myriad of "data enrichment" use cases.
Depends on:
- https://github.com/ibis-project/ibis/issues/7712.
- https://github.com/ibis-project/ibis/issues/8247
Describe the solution you'd like
Roughly, the new API for temporal join against an Iceberg table would look like:
table_left = con.create_table(...)
iceberg_table = con.create_table(...)
expr = table_left.temporal_join(
iceberg_table,
predicates=[
table_left["key"] == table_right["key"],
table_left["time"] >= table_right["time"],
],
)
References
- https://iceberg.apache.org/docs/latest/flink/
- Getting Started with Flink SQL and Apache Iceberg
What version of ibis are you running?
7.2.0
What backend(s) are you using, if any?
Flink
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for the issue!
We don't support Iceberg yet, because Iceberg's Python integration isn't mature enough for us to use yet.
Additionally, there are two distinct issues here:
- Iceberg support (e.g.,
read_iceberg). - A temporal join API.
The first is captured in #7712, including details about why the Python integration isn't mature enough for us to use.
The temporal join API needs to be fleshed out with more detail about its semantics, ideally with a couple use cases to help folks grok the API design.
I just noticed you linked both the relevant issues 😅
Thank you!