Willie
Willie
Hi Team. I also faced this now.The relevant configurations are as follows: "hoodie.datasource.write.partitionpath.field" = "region:SIMPLE" "hoodie.datasource.write.keygenerator.class" = "org.apache.hudi.keygen.CustomKeyGenerator" When we used 0.14.0 for the first write, there was no problem....
CC: @nsivabalan @xushiyan @codope
Thank you @ad1happy2go
Hi @danny0405 , we don't need the metadata table, so as i mentioned, we set metadata.enable=false. We are using hudi in AWS EMR, so we don't have chance to use...
Hi @danny0405 @xushiyan , We are using spark3.4.1 and hudi0.14.0. Updated the context and please help look into this. Thank you
The reason we do not use metadata table is that in spark structured streaming, enabling the metadata table will affect the efficiency of micro batch, as there will be additional...
Hi @danny0405, I didn't understand your point. This is a job for writing data using Spark Structured Streaming + Hudi, and it's an MOR INSERT operation. It seems unrelated to...
@ad1happy2go @danny0405 I'm sorry, I have re uploaded a new trace log 😅
@ad1happy2go Sure, here it is. "hoodie.datasource.write.table.type" = "MERGE_ON_READ" "hoodie.table.name" = "smart_event" "hoodie.datasource.write.recordkey.field" = "rowkey" "hoodie.datasource.write.operation" = "insert" "hoodie.datasource.write.hive_style_partitioning" = "true" "hoodie.datasource.hive_sync.partition_fields" = "dt" "hoodie.datasource.hive_sync.partition_extractor_class" = "org.apache.hudi.hive.MultiPartKeysValueExtractor" "hoodie.datasource.write.precombine.field" = "log_time" "hoodie.upsert.shuffle.parallelism"...
@ad1happy2go you can follow the following step to reproduce it Steps to reproduce the behavior: Filegroup has only one data file. Delete the deltacommit corresponding to this filegroup (only delete...