coreutils
coreutils copied to clipboard
tail: fix race condition
fixes #3765
There exists a race condition (RC) that can occur if changes to a path
happen after the initial print loop in uu_tail()
, but before the
path is added to the notify-Watcher thread in follow()
.
@niyaznigmatullin found this RC and identified it as the reason why "gnu/tests/tail-2/F-headers.sh" failed sometimes in the CI and he also came up with a fix for it.
The proposed fix manually handles possible events that could have been missed before.
However, this does not reduce the possibility for the RC to occur, and it uses a lot of duplicate code instead of utilizing the functionality of the Watcher thread together with the handle_event()
function.
To minimize the window where the RC can occur, this PR moves starting the
Watcher thread and adding paths to it from follow()
to the initial
print loop in uu_tail()
.
Additionally, to make sure the RC cannot happen in "gnu/tests/tail-2/F-headers.sh", the error message that is used as a trigger in this test, is delayed until the path is added to the Watcher thread.
The workarounds for "tests/tail-2/F-headers.sh" in build-gnu.sh should no longer be needed.
@niyaznigmatullin how do you feel about this? thanks
I like the change, but can we move the watcher addition before tail actually writes anything?
I think I found another race condition on detecting the orphans. Because it's detected after follow is run, if a file didn't exist and it's parent directory didn't exist, we print that we cannot access it. And then if it's created, we don't watch it, and it's not detecting that it's an orphan, because the directory could have been created by that time.
As I understand, the idea was, that we need to make sure for each file either it's being watched, or it's parent is watched or it's added to orphans. So the last is not true sometimes.
@niyaznigmatullin Thanks for the feedback. I think both of your points are valid. I'll implement them soon.
This should be good to go.
I tested all tests with timeouts, everything passed but descriptor-vs-rename.sh
. I've found the reason, it moves the file:
mv a b
and then immediately writes echo y >> b
, but when we handle event of renaming, we seek to the end of the new file, skipping the last written part, which never occurs in tail output.
I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: https://github.com/niyaznigmatullin/coreutils/commit/fd145ee1e53e7fc2a6bce9a8c56563bf7bf9f8cb
I tested all tests with timeouts, everything passed
Nice, thanks for testing.
I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: niyaznigmatullin@fd145ee
That looks like a good idea, please open a follow-up PR.
(I would keep the ?
for the File::open()
call and remove the None
case. Otherwise if there is an error, we will never know about it.)