hudi
hudi copied to clipboard
[SUPPORT] Using MRO table and synchronizing to hive, Flink checkpoint failed, resulting in log files being unable to scroll to parquet files
Tips before filing an issue
-
Have you gone through our FAQs?y
-
Join the mailing list to engage in conversations and get faster support at [email protected].
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
Using MRO table and synchronizing to hive, Flink checkpoint failed, resulting in log files being unable to scroll to parquet files
Using Flink to write a large amount of data to the Hudi table and synchronize it to Hive. Will occasional checkpoint failures result in the log file being unable to scroll to a parquet file, resulting in Hive querying less data than the Hudi table
To Reproduce
Steps to reproduce the behavior:
- hive_sync.enabled=true 2.start flink checkpoint
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
-
Hudi version : 1.0
-
Flink version :1.15.2
-
Hive version :3.1.3
-
Hadoop version :
-
Storage (HDFS/S3/GCS..) :
-
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
Are you using append mode or upsert mode?
Are you using append mode or upsert mode?
使用的upsert 模式,数据为flink-cdc写入kafka的增量数据,再使用flink读取kafka数据写入hudi的mor表,查看checkponint发现有失败的,导致hudi同步到hive的外部表有漏数情况,具体是因为checkponint失败导致log文件没有滚动成parquet文件,请问这种情况有什么好的处理方式吗,目前checkponint间隔时间为150秒,超时时间450秒,可同时checkponint的最大数为3
日志zhogn中发现的异常
cc @danny0405