hudi
hudi copied to clipboard
[HUDI-7529] fix multiple tasks get the lock at the same time when use…
config:
- occ open
- use FileSystemBasedLock
- mdt is open in write defualt
there has three job, jobA, jobB, jobC, these three jobs are running at the same time.
jobA get lock success, jobB has been trying to get lock, jobC also try to get lock.
jobB failed because can not get lock, but it delete lock file when close write client, now, jobC will get lock, it cause concurrent problem.
In our case, jobC will rollback jobA mdt commit which has been succeed commited. So, the data table timeline has the repleaseCommit instance, but mdt without this update, it cause partition path be deleted and can not reserve the latest file split in our case
Change Logs
- will check lock create_time in memory before delete lock file
- only lock owner or lock is expired can delete lock
Impact
- the lock file may always exist. If the process exits normally and the expiration time is not set up
Risk level (write none, low medium or high below)
medium
Documentation Update
None
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
We are planning to make the MDT non-blocking, hope that helps in this scenario.
CI report:
- 83b7e66703eb4da5ac80a77ca156ad1c34ef60ad Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azure
re-run the last Azure build
We are planning to make the MDT non-blocking, hope that helps in this scenario.
you mean rfc-66? Let me study it. but I think this will cause other unpredictable exception not only mdt meta, because multiple job get the lock in the same time.
We have re-designed the lock acquisition since 1.0.
@danny0405 hi, is there any other modification suggestions for this question? Or this pr no need land?
close it