ibis icon indicating copy to clipboard operation
ibis copied to clipboard

meta: increase Flink's streaming backend coverage

Open chloeh13q opened this issue 1 year ago • 1 comments
trafficstars

Is your feature request related to a problem?

While we have implemented support for the basic operators in the Flink backend, there are still some operators that are commonly seen in streaming workloads that we have not yet implemented support for. Some of these are pending the sqlglot refactoring work.

Describe the solution you'd like

  1. Alternative syntax for top k (follow-up to #7407)
  2. Deduplication using distinct() (pending #7556)
  3. Array expansion (pending sqlglot refactoring)
  4. ops.ArrayCollect (pending UDF support)
  5. Window joins with more complex syntax (ANTI/SEMI) (follow-up to #7966)
  6. Temporal join (pending sqlglot refactoring) (follow-up to #7921)
  7. Time travel query (pending sqlglot refactoring) (#8203)
  8. Pattern recognition (#8252)
  9. MAP support (covered by ibis/backends/tests/test_map.py)
  10. https://github.com/ibis-project/ibis/issues/8254 (testing)

What version of ibis are you running?

8.0

What backend(s) are you using, if any?

Flink

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

chloeh13q avatar Feb 06 '24 20:02 chloeh13q

Propose to include 9. MAP support in this issue; covered by ibis/backends/tests/test_map.py and useful to get UDFs with map working in https://github.com/ibis-project/ibis/pull/8142.

deepyaman avatar Feb 08 '24 21:02 deepyaman

Temporal join issue: https://github.com/ibis-project/ibis/issues/8247

mfatihaktas avatar Feb 27 '24 20:02 mfatihaktas

Weekly update [2/29/24]

[3-Array expansion]: issue opened (#8457), feature implementation WIP.

[6-Temporal join]: issue opened (#8247), draft PR #8412 in review.

[7-Time travel query]: issue opened (#8203), exploratory work on implementation (currently blocked by catalog not supporting time travel: https://issues.apache.org/jira/browse/FLINK-34553).

[9-MAP support]: Done (#8425).

[10-Temporal join on Iceberg table]: depends on #7712, draft PR #8343 to address #7712. This work is paused for now because of significant blockers in pyiceberg. See the discussion thread in #8343 for more context.No update on remaining items.

chloeh13q avatar Feb 29 '24 20:02 chloeh13q

Weekly update [3/6/24]

P0

[3 - Array expansion] - PR in review #8511 [4 - ops.ArrayCollect] - issue opened #8555, pausing implementation until Flink 1.20 release [6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537 [9 - MAP support] - DONE

P1

[1 - Alternative syntax for topk] - planned [2 - Deduplication using distinct()] - waiting on #7556, which is currently blocked by #8509

P2

[5 - Complex window join] - not started [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - not started [10 - Temporal join on Iceberg table] - blocked and paused

Ad hoc

We have raised a few additional issues regarding

  • [DONE] How to set up the Flink backend / spin up the docker containers for Flink on arm64
  • [DONE] Issues with runnning tests for the Flink backend locally
  • [P2] Compiling memtables with nested data: #8516

chloeh13q avatar Mar 06 '24 20:03 chloeh13q

Weekly update [3/14/24]

P0

[3 - Array expansion] - PR #8511, blocked by #8516 (WIP) [4 - ops.ArrayCollect] - paused [6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537 (WIP) [9 - MAP support] - DONE

P1

[1 - Alternative syntax for topk] - under investigation [2 - Deduplication using distinct()] - blocked by #8509 (WIP)

P2

[5 - Complex window join] - not started [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - WIP [10 - Temporal join on Iceberg table] - blocked and paused

chloeh13q avatar Mar 15 '24 15:03 chloeh13q

Weekly update [3/20/24]

P0

[3 - Array expansion] - PR #8511, blocked by #8516 (WIP) [4 - ops.ArrayCollect] - paused [6 - Temporal join] - issue opened #8247, implementation in progress #8412, which is currently blocked by #8537 (WIP) [9 - MAP support] - DONE

P1

[1 - Alternative syntax for topk] - under investigation [2 - Deduplication using distinct()] - blocked by #8509 (WIP)

P2

[5 - Complex window join] - work planned under issue #8710 [7 - Time travel] - issue opened #8203, draft PR in review #8517 [8 - Pattern recognition] - WIP #8692 [10 - Temporal join on Iceberg table] - blocked and paused

chloeh13q avatar Mar 20 '24 23:03 chloeh13q

Weekly update [4/11/24]

P0

[3 - Array expansion] - PR in review #8511 [4 - ops.ArrayCollect] - paused [6 - Temporal join] - no update [9 - MAP support] - DONE

P1

[1 - Alternative syntax for topk] - DONE [2 - Deduplication using distinct()] - no update

P2

[5 - Complex window join] - PR in review #8745 [7 - Time travel] - PR in review #8517 [8 - Pattern recognition] - WIP #8692 [10 - Temporal join on Iceberg table] - no update

chloeh13q avatar Apr 11 '24 16:04 chloeh13q

closing out from last quarter

lostmygithubaccount avatar Apr 17 '24 21:04 lostmygithubaccount