[core] Introduce the BucketFunction interface
Purpose
Linked issue: close #5444
Tests
API and Format
Documentation
CC @JingsongLi please take a look when you are free.
BTW, I'm confused the CI compile error do not occur in my local
CC @JingsongLi please take a look when you are free.
BTW, I'm confused the CI compile error do not occur in my local
![]()
This seems the bug for the maven 3.9.9. I add the paimon-common dependency to work around this.
PR that is also useful for me
PR that is also useful for me
@Pandas886 May I ask what's your use case?
PR that is also useful for me
@Pandas886 May I ask what's your use case?
Currently, Paimon is being integrated into our internal data pipeline tool. When writing to Paimon, if we want to support multiple parallel writes to fixed bucket tables, it requires shuffling by bucket key, with each writer writing data for its own bucket. However, within the data pipeline, the transform phase has already converted the data to an internal format, making it impossible to call the Paimon SDK to retrieve the bucket key.
resolved conflict. Please take another look again. @JingsongLi
cc @Zouxxyy @YannByron This PR also add a new parameter (hashType) for the spark bucket function
I prefer to provide a BUCKET FUNCTION instead HASH FUNCTION. Now compute a bucket is:
Math.abs(hashcode % numBuckets). I'm not sure if it's universal enough, but BucketFunction is definitely universal enough.
Thanks, +1 for your suggestion.
cc @Zouxxyy to take a look to spark part.
Maybe you should change
PaimonScantoo?
@JingsongLi Yes, I think we should distinguish in different bucket function. So, we have to introduce new bucket transformer for this. I disable the bucket scan for the other bucket function type now.
Please take a look again CC @JingsongLi @Zouxxyy
@luoyuxia can you also help take a look ?
@Zouxxyy I have addressed your comments, please take a look again.