Sivabalan Narayanan comments

Results 439 comments of


                                            Sivabalan Narayanan

Observing data duplication with Single Writer

unless you configure lock providers, hudi can't guarantee this. I would suggest to add locking for both writers.

Observing data duplication with Single Writer

oh, I thought, both jobs are running concurrently? is it not. can you throw some light on exact steps. is it. step1: start job1 in EMR cluster1. which consumes from...

Observing data duplication with Single Writer

@koochiswathiTR : can you check my above response and update please.

Observing data duplication with Single Writer

here is what is happening. if there are two concurrent writers writing to non overlapping data files, hudi will succeed both writes. but if both are modifying the same data...

Observing data duplication with Single Writer

you can read about multi writer guarantees here https://hudi.apache.org/docs/concurrency_control#multi-writer-guarantees

Observing data duplication with Single Writer

nope. thats not how it works as of today. 2nd writer don't wait for 1st writer to complete. Thats not OCC at all in my understanding. what you are suggesting...

Observing data duplication with Single Writer

I have put up a patch to auto retry with spark data source writes incase of conflicts https://github.com/apache/hudi/pull/6854 Hope that helps your case.

[HUDI-3478] Implement CDC Read in Spark

cancelling all azure CI runs for now to investigate CI flakiness. will retrigger build once we are in stable state. sorry about the inconvenience.

[HUDI-2057] CTAS Generate An External Table When Create Managed Table

@xushiyan : can you assist the author and help take this home.

[MINOR] Fixes to make unit tests work on m1

have cancelled CI run for now. investigating CI flakiness. will trigger azure CI run once we fix the flakiness.