borg icon indicating copy to clipboard operation
borg copied to clipboard

improve stale lock handling

Open ThomasWaldmann opened this issue 4 years ago • 1 comments

taken from #955 where this was suggested:

by @jdchristensen : However, if some kind of automated lock removal was desired, one idea would be for borg to update the timestamp on the lock every 5 minutes. Then a future borg process could break a lock if it found it older than, say, 10 minutes.

by @poelzi : The correct way to handle such locks is with a pid+machine id inside the lock and and timestamp couter that gets updated every minute. If there is no such process on the same machine: stale lock, remove. If the machineid is a foreign machine and the timestamp is not updated some minutes, remove lock. If the lock is updated, wait until cleared.

ThomasWaldmann avatar Jan 26 '21 16:01 ThomasWaldmann

We had a case discussed on irc where the lock breaking failed to work because the hostid of the system was not stable across reboots. The lock.exclusive directory contains files with names containing the hostname (which was stable) and the hostid (which in this case varied across reboots).

Users that have this problem can see that the output of python3 -c "import uuid; print(uuid.getnode())" does not match the number in the exclusive lock. If that is the case, a workaround is setting BORG_HOST_ID manually to a string that is unique for all machines using borg.

Maybe we can make this problem easier to discover by printing a warning when process_alive is called for a host that has a matching hostname but a different hostid. At least for this case there is a workaround.

In this case we couldn't find why the hostid was not stable across reboots but the code in the standard library might tell us when the hostid is just random, which maybe be useful for borg it maybe ignore it in the first place.

textshell avatar Aug 08 '21 15:08 textshell