[Improvement]: Refact building shuffle policy in flink sink connector with SPI, making it more flexible and pluggable
Search before asking
- [X] I have searched in the issues and found no similar issues.
What would you like to be improved?
At present, amoro flink connectors only use one class ‘RoundRobinShuffleRulePolicy’ to implement different shuffle policies such as non\keyed hash\ partitioned hash. The code could be refactor to improve the readability. Most importantly, other (customised) shuffle policies(e.g range, load balanced)can be supported easily in the future by , for example, configuration instead of hard coding.
How should we improve?
We can use java SPI to dynamically load shuffle policies. It is achieved by simply adding shuffle policy implementation to application and configure shuffle policy factory in service configuration files , this definitely follows the implementation of flink table factory.
Then, developers can contribute their own shuffle policies based on this service framework. For me, it may be a load balanced shuffle policy.
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Subtasks
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@zhoujinsong any idea on this topic?
In fact, the sink in the iceberg community has improvements for shuffle. if use a spi mechanism, which may be difficult. In iceberg's design, a flink operator needs to be added to collect statistical information. https://github.com/apache/iceberg/issues/6303
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'