paimon [core] Introduce the BucketFunction interface

Purpose

Linked issue: close #5444

Tests

API and Format

Documentation

Apr 11 '25 01:04 Aitozi

CC @JingsongLi please take a look when you are free.

BTW, I'm confused the CI compile error do not occur in my local

Apr 11 '25 07:04 Aitozi

CC @JingsongLi please take a look when you are free.

BTW, I'm confused the CI compile error do not occur in my local

This seems the bug for the maven 3.9.9. I add the paimon-common dependency to work around this.

Apr 12 '25 07:04 Aitozi

PR that is also useful for me

Apr 22 '25 02:04 Pandas886

PR that is also useful for me

@Pandas886 May I ask what's your use case?

Apr 22 '25 03:04 Aitozi

PR that is also useful for me

@Pandas886 May I ask what's your use case?

Currently, Paimon is being integrated into our internal data pipeline tool. When writing to Paimon, if we want to support multiple parallel writes to fixed bucket tables, it requires shuffling by bucket key, with each writer writing data for its own bucket. However, within the data pipeline, the transform phase has already converted the data to an internal format, making it impossible to call the Paimon SDK to retrieve the bucket key.

Apr 22 '25 06:04 Pandas886

resolved conflict. Please take another look again. @JingsongLi

cc @Zouxxyy @YannByron This PR also add a new parameter (hashType) for the spark bucket function

Apr 23 '25 02:04 Aitozi

I prefer to provide a BUCKET FUNCTION instead HASH FUNCTION. Now compute a bucket is: Math.abs(hashcode % numBuckets). I'm not sure if it's universal enough, but BucketFunction is definitely universal enough.

Thanks, +1 for your suggestion.

Apr 23 '25 13:04 Aitozi

cc @Zouxxyy to take a look to spark part.

Apr 24 '25 06:04 JingsongLi

Maybe you should change PaimonScan too?

@JingsongLi Yes, I think we should distinguish in different bucket function. So, we have to introduce new bucket transformer for this. I disable the bucket scan for the other bucket function type now.

Apr 27 '25 14:04 Aitozi

Please take a look again CC @JingsongLi @Zouxxyy

Jun 10 '25 00:06 Aitozi

@luoyuxia can you also help take a look ?

Jun 10 '25 03:06 Aitozi

@Zouxxyy I have addressed your comments, please take a look again.

Jun 17 '25 16:06 Aitozi