Willie comments

Results 13 comments of


                                            Willie

[SUPPORT] Config conflict with Deltastreamer CustomKeyGenerator - PartitionPath

Hi Team. I also faced this now.The relevant configurations are as follows: "hoodie.datasource.write.partitionpath.field" = "region:SIMPLE" "hoodie.datasource.write.keygenerator.class" = "org.apache.hudi.keygen.CustomKeyGenerator" When we used 0.14.0 for the first write, there was no problem....

[SUPPORT] Config conflict with Deltastreamer CustomKeyGenerator - PartitionPath

CC: @nsivabalan @xushiyan @codope

[SUPPORT] Config conflict with Deltastreamer CustomKeyGenerator - PartitionPath

Thank you @ad1happy2go

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

Hi @danny0405 , we don't need the metadata table, so as i mentioned, we set metadata.enable=false. We are using hudi in AWS EMR, so we don't have chance to use...

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

Hi @danny0405 @xushiyan , We are using spark3.4.1 and hudi0.14.0. Updated the context and please help look into this. Thank you

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

The reason we do not use metadata table is that in spark structured streaming, enabling the metadata table will affect the efficiency of micro batch, as there will be additional...

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

Hi @danny0405， I didn't understand your point. This is a job for writing data using Spark Structured Streaming + Hudi, and it's an MOR INSERT operation. It seems unrelated to...

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

@ad1happy2go @danny0405 I'm sorry, I have re uploaded a new trace log 😅

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

@ad1happy2go Sure, here it is. "hoodie.datasource.write.table.type" = "MERGE_ON_READ" "hoodie.table.name" = "smart_event" "hoodie.datasource.write.recordkey.field" = "rowkey" "hoodie.datasource.write.operation" = "insert" "hoodie.datasource.write.hive_style_partitioning" = "true" "hoodie.datasource.hive_sync.partition_fields" = "dt" "hoodie.datasource.hive_sync.partition_extractor_class" = "org.apache.hudi.hive.MultiPartKeysValueExtractor" "hoodie.datasource.write.precombine.field" = "log_time" "hoodie.upsert.shuffle.parallelism"...

[SUPPORT]FileID <filegroup> of partition path xxx=xx does not exist.

@ad1happy2go you can follow the following step to reproduce it Steps to reproduce the behavior: Filegroup has only one data file. Delete the deltacommit corresponding to this filegroup (only delete...