Manu comments

Results 28 comments of


                                            Manu

[SUPPORT]BUCKET INDEX UPSERT FAILED

set "hoodie.storage.layout.partitioner.class" = "org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner", and try again?

[SUPPORT]BUCKET INDEX UPSERT FAILED

Hi @xushiyan @yihua @15663671003, I create a pr to add default partitioner for SIMPLE BUCKET index, please have a look.

[HUDI-4902] Set default partitioner for SIMPLE BUCKET index

@hudi-bot run azure

[SUPPORT] HBase connection closed exception

> @xicm we have to shade the HBase classes to be compatible with Hive query engine which introduces HBase classes as well. Does changing all relevant class names with shading...

[SUPPORT] HBase connection closed exception

This page https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-considerations.html provides a way to work around, but the problem in spark bundle still exists.

Partitioning data into two keys is taking more time (10x) than partitioning into one key.

Add a partition field means more tasks. And the index is BUCKET, the tasks could be bucket_num*partitions in some cases.

Partitioning data into two keys is taking more time (10x) than partitioning into one key.

not sure if this is the cause, can you check the number of file groups after partition field changed, and reduce the bucket number to see the time cost.

Partitioning data into two keys is taking more time (10x) than partitioning into one key.

> can you tell me how to check number of filegroup? cli or spark sql, show_commits, pay attention to `total_files_added` and `total_files_updated` > it is still taking 45-50 min to...

Partitioning data into two keys is taking more time (10x) than partitioning into one key.

Small bucket num will not fit the growing data. Generally We estimate the data size to determine the number of buckets. I think you problem is the data is too...

Partitioning data into two keys is taking more time (10x) than partitioning into one key.

Can you check the *SubTasks* of bucket_assigner in flink ui. This tells us how many tasks in a write operation.