amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[Improvement]: Refact building shuffle policy in flink sink connector with SPI, making it more flexible and pluggable

Open nicochen opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I have searched in the issues and found no similar issues.

What would you like to be improved?

At present, amoro flink connectors only use one class ‘RoundRobinShuffleRulePolicy’ to implement different shuffle policies such as non\keyed hash\ partitioned hash. The code could be refactor to improve the readability. Most importantly, other (customised) shuffle policies(e.g range, load balanced)can be supported easily in the future by , for example, configuration instead of hard coding.

How should we improve?

We can use java SPI to dynamically load shuffle policies. It is achieved by simply adding shuffle policy implementation to application and configure shuffle policy factory in service configuration files , this definitely follows the implementation of flink table factory.

Then, developers can contribute their own shuffle policies based on this service framework. For me, it may be a load balanced shuffle policy.

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

nicochen avatar Feb 06 '24 08:02 nicochen

@zhoujinsong any idea on this topic?

nicochen avatar Feb 06 '24 08:02 nicochen

In fact, the sink in the iceberg community has improvements for shuffle. if use a spi mechanism, which may be difficult. In iceberg's design, a flink operator needs to be added to collect statistical information. https://github.com/apache/iceberg/issues/6303

huyuanfeng2018 avatar Feb 07 '24 01:02 huyuanfeng2018

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 22 '24 00:08 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Sep 06 '24 00:09 github-actions[bot]