Danny Chan

Results 393 comments of Danny Chan

Thanks for the feedback, feel free to reopen it if you still think it is a problem.

yeah, I think it may make sense we add this fix: just skip the missing files while we recovering from the state by do a existence check. And before dispatching...

yes, of course, onless we re-generate the input splicts based on the latest snapshot.

A solution to mediate the issue is to increase the retained commits for historical data, so that the reader has enough buffer time for consumption. Another solution is we add...

There is known in-compatibility for spark 3.3.2 and Hudi 0.13.0: https://github.com/apache/hudi/pull/8082, can you try the patch to see if it resolves your problem?

> during inline clustering, update the hudi schema retrieval method to retrieve it from the hudi table instead of obtaining the hoodie.avro.schema configuration. Can you elaborate a little why?

The checkpoint would trigger commit to hudi table.

The write task holds the write statuses in the state which would be resubmitted to the driver for committing to Hudi.