coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

tail: fix race condition

Open jhscheer opened this issue 1 year ago • 6 comments

fixes #3765

There exists a race condition (RC) that can occur if changes to a path happen after the initial print loop in uu_tail(), but before the path is added to the notify-Watcher thread in follow().

@niyaznigmatullin found this RC and identified it as the reason why "gnu/tests/tail-2/F-headers.sh" failed sometimes in the CI and he also came up with a fix for it.

The proposed fix manually handles possible events that could have been missed before. However, this does not reduce the possibility for the RC to occur, and it uses a lot of duplicate code instead of utilizing the functionality of the Watcher thread together with the handle_event() function.

To minimize the window where the RC can occur, this PR moves starting the Watcher thread and adding paths to it from follow() to the initial print loop in uu_tail().

Additionally, to make sure the RC cannot happen in "gnu/tests/tail-2/F-headers.sh", the error message that is used as a trigger in this test, is delayed until the path is added to the Watcher thread.

The workarounds for "tests/tail-2/F-headers.sh" in build-gnu.sh should no longer be needed.

jhscheer avatar Aug 08 '22 22:08 jhscheer

@niyaznigmatullin how do you feel about this? thanks

sylvestre avatar Aug 09 '22 08:08 sylvestre

I like the change, but can we move the watcher addition before tail actually writes anything?

niyaznigmatullin avatar Aug 09 '22 18:08 niyaznigmatullin

I think I found another race condition on detecting the orphans. Because it's detected after follow is run, if a file didn't exist and it's parent directory didn't exist, we print that we cannot access it. And then if it's created, we don't watch it, and it's not detecting that it's an orphan, because the directory could have been created by that time.

As I understand, the idea was, that we need to make sure for each file either it's being watched, or it's parent is watched or it's added to orphans. So the last is not true sometimes.

niyaznigmatullin avatar Aug 09 '22 19:08 niyaznigmatullin

@niyaznigmatullin Thanks for the feedback. I think both of your points are valid. I'll implement them soon.

jhscheer avatar Aug 09 '22 19:08 jhscheer

This should be good to go.

jhscheer avatar Aug 11 '22 18:08 jhscheer

I tested all tests with timeouts, everything passed but descriptor-vs-rename.sh. I've found the reason, it moves the file: mv a b and then immediately writes echo y >> b, but when we handle event of renaming, we seek to the end of the new file, skipping the last written part, which never occurs in tail output.

I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: https://github.com/niyaznigmatullin/coreutils/commit/fd145ee1e53e7fc2a6bce9a8c56563bf7bf9f8cb

niyaznigmatullin avatar Aug 11 '22 20:08 niyaznigmatullin

I tested all tests with timeouts, everything passed

Nice, thanks for testing.

I propose to use old opened reader, since it stores the file descriptor inside it, so no need to recreate one. Proposed change is here: niyaznigmatullin@fd145ee

That looks like a good idea, please open a follow-up PR. (I would keep the ? for the File::open() call and remove the None case. Otherwise if there is an error, we will never know about it.)

jhscheer avatar Aug 11 '22 21:08 jhscheer