hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Archive operation always release lock on the timeline when try lock failed

Open Ytimetravel opened this issue 1 year ago • 10 comments

Dear community, I have discovered an issue when using Hudi.If multiple archive tasks run on COW table and set "hoodie.archive.automatic=false", it may cause data problems. If set hoodie.archive.automatic=true, then this issue will not occur. And then I find that if the archive operation try lock failed, it will always release lock(if exist). image image image I suspect that this lock release operation may have affected other normal operations. Perhaps the problem could be avoided by doing it this way? image Looking forward to your valuable suggestions.

Hudi version :0.14.0

Ytimetravel avatar Apr 26 '24 11:04 Ytimetravel

archive itself holds an transaction lock, so we need to release it in any case, what is the wrong case you have ecountered?

danny0405 avatar Apr 27 '24 00:04 danny0405

@danny0405 Sorry, I don't remember the details of the problem (I will confirm with my colleagues and provide the results later), but based on the logic here, wouldn't it be better to attempt to release the lock only when acquiring it, otherwise is there a chance of mistakenly releasing the lock of other operations and causing problems?

Ytimetravel avatar Apr 28 '24 02:04 Ytimetravel

otherwise is there a chance of mistakenly releasing the lock of other operations and causing problems?

That's reasonable, we should ensure the lock been acquired in the first place.

danny0405 avatar Apr 28 '24 03:04 danny0405

@danny0405 @Ytimetravel The behavior of method TransactionManager#endTransaction itself is correct, it would check whether the current lock is hold by itself before it unlock. However, there is a bug in HoodieTimelineArchiver because archiving itself is not a transaction and does not correspond to any instant in timeline. When an exception occurs, it might mistakenly deletes locks held by others. image @Ytimetravel Would you like to fix this issue?

beyond1920 avatar Apr 29 '24 01:04 beyond1920

@beyond1920 Yes, I am very willing to fix this issue. image This fix has already been tested and verified~

Ytimetravel avatar Apr 29 '24 03:04 Ytimetravel

What is the general reason that the trasanction start of archival is failing?

danny0405 avatar Apr 29 '24 04:04 danny0405

@danny0405 Failed to acquire lock.

Ytimetravel avatar Apr 29 '24 06:04 Ytimetravel

image

Ytimetravel avatar Apr 29 '24 06:04 Ytimetravel

Okay, it would be great if you can fire a fix for it.

danny0405 avatar Apr 29 '24 06:04 danny0405