seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency

Open hawk9821 opened this issue 1 year ago • 6 comments

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

  • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
  • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
  • [ ] If you are contributing the connector code, please check that the following files are updated:
    1. Update change log that in connector document. For more details you can refer to connector-v2
    2. Update plugin-mapping.properties and add new connector information in it
    3. Update the pom file of seatunnel-dist
  • [ ] Update the release-note.

hawk9821 avatar Aug 07 '24 08:08 hawk9821

cc @dailai and @TaoZex

Hisoka-X avatar Aug 07 '24 08:08 Hisoka-X

Please retrigger the ci.

dailai avatar Aug 21 '24 08:08 dailai

Thinks @hawk9821 . Good job. I think your e2e case needs to be added to the case of multi-parallelism, the current case is all single parallelism. In this way, we can effectively verify whether the dynamic bucketing will change depending on the degree of parallelism of the job. Also, I think you should check the bucket count in every case instead of making a separate case. In addition, each of your cases should verify that the dynamic-bucket.target-row-num argument works as expected.

dailai avatar Aug 26 '24 00:08 dailai

Why are there so many file changes. Maybe you're having some problems with your git operations. Please reopen a pr which must only has your commits. Then you can link this pr in new pr and close this pr.

dailai avatar Aug 29 '24 06:08 dailai

Why are there so many file changes. Maybe you're having some problems with your git operations. Please reopen a pr which must only has your commits. Then you can link this pr in new pr and close this pr.

Problems caused by rebase , resolved

hawk9821 avatar Aug 29 '24 07:08 hawk9821

Thinks @hawk9821 . Good job. I think your e2e case needs to be added to the case of multi-parallelism, the current case is all single parallelism. In this way, we can effectively verify whether the dynamic bucketing will change depending on the degree of parallelism of the job. Also, I think you should check the bucket count in every case instead of making a separate case. In addition, each of your cases should verify that the dynamic-bucket.target-row-num argument works as expected.

get, add e2e case PaimonSinkDynamicBucketIT.testParallelismBucketCount PaimonSinkDynamicBucketIT.testCDCParallelismBucketCount

hawk9821 avatar Aug 29 '24 07:08 hawk9821

LGTM @Hisoka-X PTAL

dailai avatar Sep 02 '24 00:09 dailai