lockfile icon indicating copy to clipboard operation
lockfile copied to clipboard

bug: deadlock caused by pid reusing

Open fioncat opened this issue 2 years ago • 4 comments

See comments.

fioncat avatar Oct 17 '22 02:10 fioncat

I believe this is handled by dead lock owner detection tested at https://github.com/nightlyone/lockfile/blob/bf01bef5587fecae1a6519d302669c701fa41efd/lockfile_test.go#L190 and later in that test.

Could you please provide a minimal test case to reproduce the observed behaviour?

Do you try using it wih filesystem shared between multiple hosts or using different PID namespaces on Linux?

nightlyone avatar Oct 17 '22 06:10 nightlyone

I didn't use multiple hosts or pid namesapce.

When this case happended, there had many pids in my machine.

So I guess this was caused by pid-reuse:

  • program A lock pid
  • program A crashed, the lock wasn't released
  • another irrelevant program B reused this pid
  • program C try to lock, but the pid was used by B, failed

The B should not block C, but the OS may assign the pid to it, this is unpredictable.

Because Linux does not allow specifying the pid when creating a process, it is hard to simply reproduce now. But it does happen when the number of pids is in pressure.

fioncat avatar Oct 17 '22 08:10 fioncat

The Linux syscall flock solved this for me, so I created anthor filelock lib based on flock: ucloud/go-lockfile.

But this is not a cross-platform solution.

fioncat avatar Oct 18 '22 08:10 fioncat

One approach would be to also store the process start time and compare this with stored data in the PID file. But that would be backward incompatible.

Another approach would be to compare the process start time with the mtime (modification time) of the lock file and assume a stale file if that file has an much older mtime. That approach would be backward compatible.

Both approaches require to see the process start time of the process with the same PID. I am not sure how feasible and portable this approach may be.

nightlyone avatar Oct 19 '22 18:10 nightlyone