lockfile
lockfile copied to clipboard
bug: deadlock caused by pid reusing
See comments.
I believe this is handled by dead lock owner detection tested at https://github.com/nightlyone/lockfile/blob/bf01bef5587fecae1a6519d302669c701fa41efd/lockfile_test.go#L190 and later in that test.
Could you please provide a minimal test case to reproduce the observed behaviour?
Do you try using it wih filesystem shared between multiple hosts or using different PID namespaces on Linux?
I didn't use multiple hosts or pid namesapce.
When this case happended, there had many pids in my machine.
So I guess this was caused by pid-reuse:
- program
A
lock pid - program
A
crashed, the lock wasn't released - another irrelevant program
B
reused this pid - program
C
try to lock, but the pid was used byB
, failed
The B
should not block C
, but the OS may assign the pid to it, this is unpredictable.
Because Linux does not allow specifying the pid when creating a process, it is hard to simply reproduce now. But it does happen when the number of pids is in pressure.
The Linux syscall flock
solved this for me, so I created anthor filelock lib based on flock
: ucloud/go-lockfile.
But this is not a cross-platform solution.
One approach would be to also store the process start time and compare this with stored data in the PID file. But that would be backward incompatible.
Another approach would be to compare the process start time with the mtime (modification time) of the lock file and assume a stale file if that file has an much older mtime. That approach would be backward compatible.
Both approaches require to see the process start time of the process with the same PID. I am not sure how feasible and portable this approach may be.