hudi
hudi copied to clipboard
[HUDI-4373] Flink Consistent hashing bucket index write path code
Change Logs
Implement consistent hashing bucket index for flink. This PR only covers the write core of the index, and the resizing implementation will be in another PR.
There are three main changes:
- Extract common code of consistent hashing bucket index, to serve both Spark&Flink engine.
- Have Flink engine write path adapt to consistent hashing bucket index, e.g., introduce
ConsistentBucketStreamWriteOperator - Introduce the basic framework of
UpdateStrategyfor Flink, to handle conflict between concurrent clustering & update.
Impact
No public API change.
Risk level: none | low | medium | high
Low
Contributor's checklist
- [x] Read through contributor's guide
- [x] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
CI report:
- 5de4d1e47173545289d97627fd1c97d2d9da5059 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:@hudi-bot run azurere-run the last Azure build
4373.patch.zip
Thank, i have reviewed and applied a patch, let's move the clustering update strategy logic into sub-clazzs of StreamWriteFunction first.
4373.patch.zip Thank, i have reviewed and applied a patch, let's move the clustering update strategy logic into sub-clazzs of
StreamWriteFunctionfirst.
Thanks for the patch, Danny! Moving the update strategy to sub-clazzs will bring some duplicate code (e.g., flushing logic). Is it ok?
Moving down the update strategy logic to consistent hashing sub-clazzs could limit the scope of influence. And we can bring it to the standard stream write pipeline once we are certain it is stable.