notify fsevent hangs on Mac during shutdown

fsevent hangs on Mac during shutdown

Open matklad opened this issue 5 years ago • 4 comments

System details

OS/Platform name and version: macOS 10.13.6 on a MacBook Pro (15 Zoll, 2016)
Rust version (if building from source): rustc --version: rustc 1.38.0-nightly (07e0c3651 2019-07-16)
Notify version (or commit hash if building from git): 4.0.12

Hi! Users report that rust-analyzer sometimes hangs during shutdown. The stack trace points to this code:

https://github.com/passcod/notify/blob/2b1f1d4d1acc8b9738ffbe41bfe6043ba37f9431/src/fsevent.rs#L109-L111

Downstream issue (with captured stack trace): https://github.com/rust-analyzer/rust-analyzer/issues/1541

cc @killercup

Jul 22 '19 12:07 matklad

OS/Platform name and version: macOS 10.13.6 on a MacBook Pro (15 Zoll, 2016)
Rust version: rustc 1.38.0-nightly (07e0c3651 2019-07-16)

Jul 22 '19 12:07 killercup

I'm thinking this is likely because of #118. The original race was that the fsevent loop wasn't yet running, so ending it wasn't doing anything. The workaround was therefore to wait until it tells us it's waiting. Here, the loop is running (probably, given how this happens), but at the point of dropping, the fsevent loop isn't waiting for an event, so we yield until it does, but it's shutting down, so we're never going to get there.

#118 had a better solution: to use loop observers. Unfortunately I'm not a mac developer and don't really know how best to use those for this purpose. My guess is:

At startup:

Create a loop observer firing on runloop exit that, idk, sets a "i'm dead" flag or sends to a channel or something.
Save that in our struct.
Start the thread:
1. Check that the observer isn't invalidated. If it is, return. (If the observer has already been invalidated, we're stopping before we're starting, which was the cause of the initial deadlock.)
2. Add the observer to the loop.
3. Check again.
4. Run the loop.

To stop the loop:

Check if the observer is valid. If it's not, return. (An invalid observer would mean it's fired already, that is, that the loop has already exited.)
Check if the runloop contains the observer. If it doesn't, return. (If the observer isn't present, we're exiting either before the loop has been initialised, before the thread has run, or while it's exiting but before the observer has been invalidated. Or, because this is multi-threaded and anything is possible, the observer was invalidated in between steps 1 and 2.)
Invalidate the observer. This is to prevent the loop from starting if we're stopping before starting.
Check the flag/channel that the observer sets/sends. If observer has run, return.
Tell the runloop to stop.

This has no infinite wait loops, so no hangs and no 100% CPU usage. I think it covers ~~all~~ most cases. To be extra careful, there may be cases for:

starting the runloop for a few milliseconds only, rechecking the observers and flags, then running indefinitely.
adding another observer on enter that seeds the is_running flag so we have a better indication of whether the loop is running.
storing the thread handle and killing it after a timeout after we tell the runloop to stop.

This isn't especially hard to implement, so I can do that fairly soon. However, beforehand I'd want some kind of review of the above (or someone to say this is ridiculous and '''[[[this]]] is how to do that''') by a Rust developer familiar with CFRunLoop and mac programming (or a mac developer familiar with Rust, whichever). If you know someone... ;)

Pinging @cmyr in case they can take a look

Jul 22 '19 23:07 passcod

#210 is merged and fixes at least several deadlocks that also happens on linux, so my take would be to release a new version and see whether this resolves it ?

Oct 16 '19 14:10 0xpr03

4.0.14 is released, I'd appreciate feedback if this fixes the problem (or others..) as I can't test it on mac

Oct 17 '19 14:10 0xpr03

notify notify copied to clipboard

fsevent hangs on Mac during shutdown

System details

notify
notify copied to clipboard