hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] StreamWriteFunction support Exectly-Once in Flink ?

Open seekforshell opened this issue 1 year ago • 4 comments

Describe the problem you faced

flink1.14.3 + hudi 0.12.1 when i use org.apache.hudi.sink.StreamWriteFunction in flink stream job, if jobmanager.execution.failover-strategy, region is set, it will be lost data? because this function has no state to restore ?

To Reproduce

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.12.1
  • Hadoop version : 3.1.1
  • Storage (HDFS/S3/GCS..) : HDFS
  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

seekforshell avatar Apr 12 '24 09:04 seekforshell

The checkpoint would trigger commit to hudi table.

danny0405 avatar Apr 13 '24 00:04 danny0405

eg. flink stream job like kafka_source -> window -> bucket_write, when bucket_write operator failed, the buffer data lost, although checkpoint failed for the first time, but after buckert_write restore with empty, it will be succeeded next time.

seekforshell avatar Apr 15 '24 08:04 seekforshell

The write task holds the write statuses in the state which would be resubmitted to the driver for committing to Hudi.

danny0405 avatar Apr 15 '24 09:04 danny0405

@seekforshell Do you need any other help here. Feel free to close if you have all your doubts resolved on this. Thanks.

ad1happy2go avatar May 08 '24 11:05 ad1happy2go