hudi
hudi copied to clipboard
[SUPPORT]when flink recovers from savepoint, there will be some data duplication in hudi
Tips before filing an issue
-
Have you gone through our FAQs?
-
Join the mailing list to engage in conversations and get faster support at [email protected].
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
-
Hudi version :1.11.1
-
Spark version :
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS..) :
-
Running on Docker? (yes/no) 🔕 no flink 1.14.5
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
Flink SQL flink stop -p id yid
HUDI WITH ( 'hoodie.table.type' = 'MERGE_ON_READ' ,'hoodie.datasource.write.recordkey.field' = 'id' ,'hoodie.datasource.write.precombine.field' = 'ts' ,'hoodie.datasource.write.partitionpath.field' = 'date' ,'hoodie.parquet.compression.codec'= 'snappy' ,'connector' = 'hudi' ,'path' = '$hdfsPath' ,'hive_sync.partition_fields' = 'date' ,'hive_sync.metastore.uris' = '$thrift://xxx' ,'hive_sync.db' = '$hiveDatabaseName' ,'hive_sync.table' = '$hiveTableName' ,'hive_sync.enable' = 'true' ,'hive_sync.use_jdbc' = 'false' ,'hive_sync.mode' = 'hms' ,'write.tasks'='20' ,'compaction.async.enabled'='true' ,'compaction.trigger.strategy'='num_commits' ,'compaction.delta_commits'='2' ,'write.precombine.field' = 'ts' ,'hoodie.datasource.write.partitionpath.field' = 'date' ,'hive_sync.partition_extractor_class' = 'org.apache.hudi.hive.MultiPartKeysValueExtractor' ); Late arriving updates to fact tables, there is no problem with the data of the day and there will be duplication of historical data
Is there something wrong with my operation?