Sivabalan Narayanan
Sivabalan Narayanan
unless you configure lock providers, hudi can't guarantee this. I would suggest to add locking for both writers.
oh, I thought, both jobs are running concurrently? is it not. can you throw some light on exact steps. is it. step1: start job1 in EMR cluster1. which consumes from...
@koochiswathiTR : can you check my above response and update please.
here is what is happening. if there are two concurrent writers writing to non overlapping data files, hudi will succeed both writes. but if both are modifying the same data...
you can read about multi writer guarantees here https://hudi.apache.org/docs/concurrency_control#multi-writer-guarantees
nope. thats not how it works as of today. 2nd writer don't wait for 1st writer to complete. Thats not OCC at all in my understanding. what you are suggesting...
I have put up a patch to auto retry with spark data source writes incase of conflicts https://github.com/apache/hudi/pull/6854 Hope that helps your case.
cancelling all azure CI runs for now to investigate CI flakiness. will retrigger build once we are in stable state. sorry about the inconvenience.
@xushiyan : can you assist the author and help take this home.
have cancelled CI run for now. investigating CI flakiness. will trigger azure CI run once we fix the flakiness.