ibis
ibis copied to clipboard
meta: increase Flink's streaming backend coverage
Is your feature request related to a problem?
While we have implemented support for the basic operators in the Flink backend, there are still some operators that are commonly seen in streaming workloads that we have not yet implemented support for. Some of these are pending the sqlglot refactoring work.
Describe the solution you'd like
- Alternative syntax for top k (follow-up to #7407)
- Deduplication using
distinct()(pending #7556) - Array expansion (pending sqlglot refactoring)
ops.ArrayCollect(pending UDF support)- Window joins with more complex syntax (
ANTI/SEMI) (follow-up to #7966) - Temporal join (pending sqlglot refactoring) (follow-up to #7921)
- Time travel query (pending sqlglot refactoring) (#8203)
- Pattern recognition (#8252)
MAPsupport (covered byibis/backends/tests/test_map.py)- https://github.com/ibis-project/ibis/issues/8254 (testing)
What version of ibis are you running?
8.0
What backend(s) are you using, if any?
Flink
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Propose to include 9. MAP support in this issue; covered by ibis/backends/tests/test_map.py and useful to get UDFs with map working in https://github.com/ibis-project/ibis/pull/8142.
Temporal join issue: https://github.com/ibis-project/ibis/issues/8247
Weekly update [2/29/24]
[3-Array expansion]: issue opened (#8457), feature implementation WIP.
[6-Temporal join]: issue opened (#8247), draft PR #8412 in review.
[7-Time travel query]: issue opened (#8203), exploratory work on implementation (currently blocked by catalog not supporting time travel: https://issues.apache.org/jira/browse/FLINK-34553).
[9-MAP support]: Done (#8425).
[10-Temporal join on Iceberg table]: depends on #7712, draft PR #8343 to address #7712. This work is paused for now because of significant blockers in pyiceberg. See the discussion thread in #8343 for more context.No update on remaining items.
Weekly update [3/6/24]
P0
[3 - Array expansion] - PR in review #8511
[4 - ops.ArrayCollect] - issue opened #8555, pausing implementation until Flink 1.20 release
[6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537
[9 - MAP support] - DONE
P1
[1 - Alternative syntax for topk] - planned
[2 - Deduplication using distinct()] - waiting on #7556, which is currently blocked by #8509
P2
[5 - Complex window join] - not started [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - not started [10 - Temporal join on Iceberg table] - blocked and paused
Ad hoc
We have raised a few additional issues regarding
- [DONE] How to set up the Flink backend / spin up the docker containers for Flink on arm64
- [DONE] Issues with runnning tests for the Flink backend locally
- [P2] Compiling memtables with nested data: #8516
Weekly update [3/14/24]
P0
[3 - Array expansion] - PR #8511, blocked by #8516 (WIP)
[4 - ops.ArrayCollect] - paused
[6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537 (WIP)
[9 - MAP support] - DONE
P1
[1 - Alternative syntax for topk] - under investigation
[2 - Deduplication using distinct()] - blocked by #8509 (WIP)
P2
[5 - Complex window join] - not started [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - WIP [10 - Temporal join on Iceberg table] - blocked and paused
Weekly update [3/20/24]
P0
[3 - Array expansion] - PR #8511, blocked by #8516 (WIP)
[4 - ops.ArrayCollect] - paused
[6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537 (WIP)
[9 - MAP support] - DONE
P1
[1 - Alternative syntax for topk] - under investigation
[2 - Deduplication using distinct()] - blocked by #8509 (WIP)
P2
[5 - Complex window join] - work planned under issue #8710 [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - WIP #8692 [10 - Temporal join on Iceberg table] - blocked and paused
Weekly update [4/11/24]
P0
[3 - Array expansion] - PR in review #8511
[4 - ops.ArrayCollect] - paused
[6 - Temporal join] - no update
[9 - MAP support] - DONE
P1
[1 - Alternative syntax for topk] - DONE
[2 - Deduplication using distinct()] - no update
P2
[5 - Complex window join] - PR in review #8745 [7 - Time travel] - PR in review #8517 [8 - Pattern recognition] - WIP #8692 [10 - Temporal join on Iceberg table] - no update
closing out from last quarter