Danny Chan
Danny Chan
Thanks for the feedback, feel free to reopen it if you still think it is a problem.
Can you check the CI failures?
yeah, I think it may make sense we add this fix: just skip the missing files while we recovering from the state by do a existence check. And before dispatching...
yes, of course, onless we re-generate the input splicts based on the latest snapshot.
A solution to mediate the issue is to increase the retained commits for historical data, so that the reader has enough buffer time for consumption. Another solution is we add...
There is known in-compatibility for spark 3.3.2 and Hudi 0.13.0: https://github.com/apache/hudi/pull/8082, can you try the patch to see if it resolves your problem?
@ad1happy2go Would you mind to reproduce with the given cmd by @pushpavanthar ?
> during inline clustering, update the hudi schema retrieval method to retrieve it from the hudi table instead of obtaining the hoodie.avro.schema configuration. Can you elaborate a little why?
The checkpoint would trigger commit to hudi table.
The write task holds the write statuses in the state which would be resubmitted to the driver for committing to Hudi.