hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7529] fix multiple tasks get the lock at the same time when use…

Open KnightChess opened this issue 1 year ago • 4 comments

config:

  • occ open
  • use FileSystemBasedLock
  • mdt is open in write defualt

there has three job, jobA, jobB, jobC, these three jobs are running at the same time.

jobA get lock success, jobB has been trying to get lock, jobC also try to get lock.

jobB failed because can not get lock, but it delete lock file when close write client, now, jobC will get lock, it cause concurrent problem.

In our case, jobC will rollback jobA mdt commit which has been succeed commited. So, the data table timeline has the repleaseCommit instance, but mdt without this update, it cause partition path be deleted and can not reserve the latest file split in our case

Change Logs

  • will check lock create_time in memory before delete lock file
  • only lock owner or lock is expired can delete lock

Impact

  • the lock file may always exist. If the process exits normally and the expiration time is not set up

Risk level (write none, low medium or high below)

medium

Documentation Update

None

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

KnightChess avatar Dec 26 '23 12:12 KnightChess

We are planning to make the MDT non-blocking, hope that helps in this scenario.

danny0405 avatar Dec 27 '23 04:12 danny0405

CI report:

  • 83b7e66703eb4da5ac80a77ca156ad1c34ef60ad Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Dec 27 '23 04:12 hudi-bot

We are planning to make the MDT non-blocking, hope that helps in this scenario.

you mean rfc-66? Let me study it. but I think this will cause other unpredictable exception not only mdt meta, because multiple job get the lock in the same time.

KnightChess avatar Dec 27 '23 11:12 KnightChess

We have re-designed the lock acquisition since 1.0.

danny0405 avatar Dec 28 '23 04:12 danny0405

@danny0405 hi, is there any other modification suggestions for this question? Or this pr no need land?

KnightChess avatar Jan 11 '24 13:01 KnightChess

close it

KnightChess avatar Mar 06 '24 02:03 KnightChess