flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-20281][table] support consuming cdc stream about window tvf aggregate

Open xuyangzhong opened this issue 1 year ago • 2 comments

What is the purpose of the change

Currently, window aggregation doesn't support to consume a changelog stream, and will throw an exception.

This pr is introduced to support consuming cdc stream about window aggregate.

See more at FLINK-20281 and FLINK-27539

Brief change log

(for example:)

  • Update the logic about inferring ModifyKind and UpdateKind for window agg node
  • Introduce a count in window agg, and when there is no data in window, not output the agg result
  • Add tests for TUMBLE, HOP and CUMULATE window tvf to consume cdc source

Verifying this change

Tests are added for TUMBLE, HOP and CUMULATE window tvf to consume cdc source

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? no need to update doc

xuyangzhong avatar Jan 05 '24 10:01 xuyangzhong

CI report:

  • ca13da2a16a3a02212c3eacfa198a512068cdf99 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 05 '24 10:01 flinkbot

Hi, @LadyForest thanks for your patient review.

One suggestion is to split table-runtime and table-planner changes into two smaller commits to ease the review. Great suggestion! The reason I don't split this big pr into two commits, runtime and planner, is as following:

  1. There are not many changes to the pure runtime part. This pr looks like too big is because the auto-generated files by json plan test, and this part takes up nearly 3000 lines. That means these big files will be added to the planner commits section.
  2. If splitting commits, we may first has a commit with runtime, then planner. But the harness test for runtime operator depends on planner. But anyway, I appreciate your advice.

xuyangzhong avatar Jan 11 '24 09:01 xuyangzhong