hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Using MRO table and synchronizing to hive, Flink checkpoint failed, resulting in log files being unable to scroll to parquet files

Open Toroidals opened this issue 1 year ago • 4 comments

Tips before filing an issue

  • Have you gone through our FAQs?y

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Using MRO table and synchronizing to hive, Flink checkpoint failed, resulting in log files being unable to scroll to parquet files 1707029019362

Using Flink to write a large amount of data to the Hudi table and synchronize it to Hive. Will occasional checkpoint failures result in the log file being unable to scroll to a parquet file, resulting in Hive querying less data than the Hudi table

To Reproduce

Steps to reproduce the behavior:

  1. hive_sync.enabled=true 2.start flink checkpoint

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 1.0

  • Flink version :1.15.2

  • Hive version :3.1.3

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) :

  • Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Toroidals avatar Feb 04 '24 06:02 Toroidals

Are you using append mode or upsert mode?

danny0405 avatar Feb 06 '24 03:02 danny0405

Are you using append mode or upsert mode? image

使用的upsert 模式,数据为flink-cdc写入kafka的增量数据,再使用flink读取kafka数据写入hudi的mor表,查看checkponint发现有失败的,导致hudi同步到hive的外部表有漏数情况,具体是因为checkponint失败导致log文件没有滚动成parquet文件,请问这种情况有什么好的处理方式吗,目前checkponint间隔时间为150秒,超时时间450秒,可同时checkponint的最大数为3

Toroidals avatar Feb 07 '24 02:02 Toroidals

image 日志zhogn中发现的异常

Toroidals avatar Feb 07 '24 02:02 Toroidals

cc @danny0405

ad1happy2go avatar Feb 08 '24 16:02 ad1happy2go