notify Watching and Unwatching files can deadlock when consuming events with an unbounded channel

System details

OS/Platform name and version: Linux and Mac
Rust version (if building from source): rustc --version: rustc 1.65.0 (897e37553 2022-11-02)
Notify version (or commit hash if building from git): 5.0.0

What you did (as detailed as you can)

Notify can deadlock when using bounded channels to send events outside of the event handler, and the outside caller then tries to call watch() or unwatch() in response to the event. This is similar to #410, although that's about calling watch() and unwatch() from inside the handler.

The reason this happens is that most (or all, I didn't check every implementation) watchers are implemented in an actor model, where a separate thread interacts with the underlying file system watcher, and the watch() and unwatch() methods push an event into a channel for the actor to actually manipulate the environment.

The challenge here is that the event handler runs on the same thread as the the actor, so if the bounded channel the event handler fills up and blocks, it will stop the actor from processing watch() and unwatch() commands.

I've written up a simple test case that demonstrates this bug:

Cargo.toml:

[package]
name = "foo"
version = "0.1.0"
authors = ["Erick Tryzelaar <[email protected]>"]
edition = "2018"

[dependencies]
notify = "5.0.0"
tempfile = "3.3.0"

src/lib.rs:

use {
    notify::Watcher as _,
    std::{path::Path, sync::mpsc, time::Duration},
};

#[test]
fn unwatching_while_sending_events_on_full_bounded_channels_will_hang() {
    let count = 5;
    let (event_tx, event_rx) = mpsc::sync_channel(count);

    // Fill the channel so it'll block.
    for _ in 0..count {
        event_tx.send(()).unwrap();
    }

    let mut watcher = notify::recommended_watcher(move |_| {
        event_tx.send(()).unwrap();
    })
    .unwrap();

    let file = tempfile::NamedTempFile::new().unwrap();

    watcher
        .watch(file.path(), notify::RecursiveMode::NonRecursive)
        .unwrap();

    // Wait for all the file registration to be setup.
    std::thread::sleep(std::time::Duration::from_secs(1));

    // Touch the file to create an event.
    std::fs::write(file.path(), b"hello world").unwrap();

    // This will hang.
    watcher.unwatch(file.path()).unwrap();
}

#[test]
fn unwatching_while_sending_events_on_unbounded_channel_works() {
    let (event_tx, event_rx) = mpsc::channel();

    let mut watcher = notify::recommended_watcher(move |_| {
        event_tx.send(()).unwrap();
    })
    .unwrap();

    let file = tempfile::NamedTempFile::new().unwrap();

    watcher
        .watch(file.path(), notify::RecursiveMode::NonRecursive)
        .unwrap();

    // Wait for all the file registration to be setup.
    std::thread::sleep(std::time::Duration::from_secs(1));

    // Touch the file to create an event.
    std::fs::write(file.path(), b"hello world").unwrap();

    // This should not hang.
    watcher.unwatch(file.path()).unwrap();
}

When run, the unbounded channel will succeed, the bounded channel will hang.

The two ways I can think of to fix this are:

watch and unwatch could not wait for the operation to complete. However, this would change the semantics of these methods.
The event handler could run on a separate thread, and the watcher uses an unbounded thread to push messages from the "watcher thread to the event handler thread". This would end up using more resources, but might also fix #410. Or it could use a bounded channel, but be written such that it'll drop events if the "watcher->event" channel fills up. I believe this is what the OS file watchers do if its internal buffers fill up.

What you expected

Notify not to deadlock

What happened

Notify deadlocked.

Jan 14 '23 00:01 erickt

Thanks for that in-depth writeup, now that I think about it, it does make a ton of sense that we can (and will) deadlock here.

I'm not sure if simply dropping events is a good solution, it's easy, for sure, but it'll also lead to dropping possibly important events, starting with user watch commands and leading to some internal events used by some of the OS backends.

I'm not sure I'll have time to get to this until end of feb.

Jan 14 '23 00:01 0xpr03

I think you could implement the two thread solution with a bounded channel with a two level state machine. we might need this since there isn’t really a way to keep the event handler from blocking (you sort of could with async, but that’d probably require a pretty major rewrite).

The outer one reads from either the fs event stream or the watch/unwatch stream. If an fs event, use try_send to pass it to the event stream thread. If successful, go back to the beginning.

If the channel is full, go into another state machine that buffers the unsent message, and then tries to read from the watch/unwatch stream, or time out after some time. Either way try to send the event to the event thread. If so, go back to the first state machine, otherwise stay in the second state machine.

there might be another way to it without the second thread, but this would keep the memory usage from ballooning out of control.

Jan 15 '23 00:01 erickt