flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-37890] [table-planner] Add commonJoinKey check inside JoinToMultiJoinRule

Open SteveStevenpoor opened this issue 6 months ago • 5 comments

What is the purpose of the change

We should have the optimizer check if it's possible to use a StreamingMultiJoinOperator according the common key restrictions.

Brief change log

  • Modify canCombine method to check whether two nodes have common join keys

Verifying this change

This change is already covered by existing tests, such as MultiJoinTest introduced in this pr. I added two more cases which cover no common join key behavior.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

SteveStevenpoor avatar Jun 19 '25 08:06 SteveStevenpoor

CI report:

  • 28f59cde55eddc3806f79573d0958f0c84696d1b Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jun 19 '25 08:06 flinkbot

Hey, @gustavodemorais! Let me know what you think about this PR. Also I think we need to add a ticket for right joins support with respect to RightToLeftJoinRule. I guess it's not necessary for the deadline so I can do it without a rush.

SteveStevenpoor avatar Jun 19 '25 08:06 SteveStevenpoor

Hey @SteveStevenpoor, thanks for the PR! I'll take a look and review briefly. I saw you comment about telegram, could you instead join the Community Slack and drop me a message? https://flink.apache.org/what-is-flink/community/#slack

gustavodemorais avatar Jun 19 '25 10:06 gustavodemorais

Hey @SteveStevenpoor, thanks for the PR! I'll take a look and review briefly. I saw you comment about telegram, could you instead join the Community Slack and drop me a message? https://flink.apache.org/what-is-flink/community/#slack

Could you please provide me an invitation link? This one https://flink.apache.org/what-is-flink/community/#slack is amiss

SteveStevenpoor avatar Jun 19 '25 11:06 SteveStevenpoor

Hey @SteveStevenpoor, thanks for the PR! I'll take a look and review briefly. I saw you comment about telegram, could you instead join the Community Slack and drop me a message? https://flink.apache.org/what-is-flink/community/#slack

Could you please provide me an invitation link? This one https://flink.apache.org/what-is-flink/community/#slack is amiss

Check out if this works: https://join.slack.com/t/apache-flink/shared_invite/zt-37sy2ayrm-oUApLhv~BLWFtTjgOiTlQA. If it doesn't, please share your email with me

gustavodemorais avatar Jun 19 '25 13:06 gustavodemorais

I think we could simplify the joinKeys to be Set of < int > instead of Set < String > by just using the operand.index. Each join key has an absolute unique index if you take in consideration all inputs and maybe the code would be shorter. Wouldn't that work?

In the first version I compared only indexes as you suggest, but if there is projection that changes the order of fields (like the one from RightToLeftJoinRule) the result could be wrong. Set of strings is more complex codewise but it can guarantee projection independence and it's good for debug. I think I could rethink using set of strings when working on RightToLeftJoin integration

SteveStevenpoor avatar Jun 20 '25 03:06 SteveStevenpoor