hudi [HUDI-4373] Flink Consistent hashing bucket index write path code

[HUDI-4373] Flink Consistent hashing bucket index write path code

Open YuweiXiao opened this issue 3 years ago • 1 comments

Change Logs

Implement consistent hashing bucket index for flink. This PR only covers the write core of the index, and the resizing implementation will be in another PR.

There are three main changes:

Extract common code of consistent hashing bucket index, to serve both Spark&Flink engine.
Have Flink engine write path adapt to consistent hashing bucket index, e.g., introduce ConsistentBucketStreamWriteOperator
Introduce the basic framework of UpdateStrategy for Flink, to handle conflict between concurrent clustering & update.

Impact

No public API change.

Risk level: none | low | medium | high

Low

Contributor's checklist

[x] Read through contributor's guide
[x] Change Logs and Impact were stated clearly
[ ] Adequate tests were added if applicable
[ ] CI passed

Sep 22 '22 02:09 YuweiXiao

CI report:

5de4d1e47173545289d97627fd1c97d2d9da5059 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

Oct 12 '22 11:10 hudi-bot

4373.patch.zip Thank, i have reviewed and applied a patch, let's move the clustering update strategy logic into sub-clazzs of StreamWriteFunction first.

Oct 14 '22 03:10 danny0405

4373.patch.zip Thank, i have reviewed and applied a patch, let's move the clustering update strategy logic into sub-clazzs of StreamWriteFunction first.

Thanks for the patch, Danny! Moving the update strategy to sub-clazzs will bring some duplicate code (e.g., flushing logic). Is it ok?

Moving down the update strategy logic to consistent hashing sub-clazzs could limit the scope of influence. And we can bring it to the standard stream write pipeline once we are certain it is stable.

Oct 14 '22 03:10 YuweiXiao

hudi hudi copied to clipboard

[HUDI-4373] Flink Consistent hashing bucket index write path code

Change Logs

Impact

Contributor's checklist

CI report:

hudi
hudi copied to clipboard