notify icon indicating copy to clipboard operation
notify copied to clipboard

Easy walk-dir like API

Open matklad opened this issue 5 years ago • 21 comments

Hi!

We've been using notify in rust-analyzer, and one thing I've noticed that a task like "watch src/**.rs glob" requires quite a bit of manual implementation. Specifically, I think two bits are complex:

  • setting up recursive watching with exclusion
  • handling of rescan event which requires repeating the recursive logic again

I feel like there's some higher-level API missing here... Ideally I'd love to use something like walkdir:

for event in WatchDir::new("foo").filter_entry(|e| !is_hidden(e)) {
    match event {
        Event::Create => (),
        Event::Change => (),
        Event::Delete => (),
     }
}

matklad avatar Jan 26 '19 09:01 matklad

Oh, that's a great idea! Certainly something I'll use for vNext, but would also be great if it was implemented for the current branch... not sure how feasible. Thanks for the lovely idea!

passcod avatar Jan 26 '19 21:01 passcod

In the mean time, hotwatch looks pretty cool: https://crates.io/crates/hotwatch (good job @francesca64):

use hotwatch::{Hotwatch, Event};

let mut hotwatch = Hotwatch::new().expect("Hotwatch failed to initialize.");
hotwatch.watch("war.png", |event: Event| {
    if let Event::Write(path) = event {
        println!("War has changed.");
    }
}).expect("Failed to watch file!");

I liked it so much I put it in the readme.

passcod avatar Jan 29 '19 00:01 passcod

Hotwatch API looks nice (especially due to not exposing mpsc channel), but it doesn't handle the most tricky bit: filtering the watched directories. That's a pretty important bit for rust-analyzer: if we just watch the whole directory with Cargo project, we get flooded (up to 100% CPU) with events for the target directory when Cargo starts building the project.

Also cc @vemoo, you might be interested in helping out with this :)

matklad avatar Feb 01 '19 08:02 matklad

Another thought: for this API, it would be important to specify the consistency semantics of the events. Obviously, if I receive Event::Create, I can't just blindly read the file: it might have been deleted in the next moment! Similarly, I'd rather not have a Rescan event in the API and instead delegate rescaning to the library.

I think what we want here is quiescent consistency: if there are no changes for a period of time, the last event for each path should correspond to the actual state. In other words, in the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals.

matklad avatar Feb 01 '19 08:02 matklad

Agreed on the filtering.

Rescan is a historical mistake, really. It means two things on two platforms, and has no equivalent on any others. In inotify it's really a fatal error event, and in fsevents it's an indication that something was missed, but fsevents misses things all the time by design (there's a huge discussion elsewhere in issues about fsevents with the tl;dr and conclusion that fsevents should not be used for this library, see #144, #147; alas, time).

Probably what that library should do to achieve such consistency is to attempt to attr files right after the quiet down and then returns Attrs (or whatever the struct is called) alongside the path and Op. If the file is non existent it could emit/modify the event to actually reflect reality, and exposing the Attrs would save the caller from doing a subsequent (and possibly outdated by then) lookup.

~~I don't know that guaranteeing "the sum of the events between two quiescent periods must be equal to the difference between two walkdir traversals" is practical, because I have low confidence in the actual reliability of these platforms APIs. Appending this guarantee with "in an ideal world" is probably good?~~ No, I am being overly cynical here. Relying only on native events is on the order of 99.9% accurate in 90% of cases. The 10% include things like unsupported filesystems, monitoring huge trees, networked shares, permission issues, and ghosts. As maintainer, I tend to see those 10% a lot more.

passcod avatar Feb 01 '19 09:02 passcod

Appending this guarantee with "in an ideal world" is probably good?

Yeah, that's what I've meant. Specifically, this clause about "consistency" was about a sublte bug we had in rust-analyzer.

Initially, we did recursive walkdiring on the one thread, and watching on another. We also are interested in the actual "contents" of the files, so both walkdiring thread and watching thread were reading files from disk and sending the results to the single channel, which was the "API".

The problem with this setup is that one thread can read file at time t1, another thread can read the file at t2 > t1, but the events in the channel could be in a swapped order. We fixed this by making sure we always read file contents from the first thread, and made the watching thread to only send requests for reading the files to the first one: that way we guarantee that file contents only progresses forward in time.

That is, besides infinite ways in which native APIs are broken, there's a fair amount of places where we might accidentally break logical "happens before" relation, and we need to think about that :)

matklad avatar Feb 01 '19 09:02 matklad

Yes, I can try to implement this.

For v4 at least for inotify shouldn't be too hard since it's already using WalkDir internally. For windows and mac I'll have to take a closer look at the code. It seems they nativelly suport recursive watches, but maybe when a filter is provided instead of creating a recursive watcher we should handle the recursion ourselves.

From taking a quick look at the next v5 code the issue I mention before is more clear thanks to the Capability but the solution could be the same. What I think is missing is a way of adding watches after the Bakend is created, that way we can implement the filtering on top of the backend.

vemoo avatar Feb 01 '19 21:02 vemoo

In v4 you get a different implementation per platform that exposes the same interface, in v5 you'd never actually interact with Backends yourself. vNext has very strong separation of abstraction between what Backend implementations do and what the consumers' concerns are, so "vNext Backends" are immutable (makes them super easy to write), and "vNext Notify" itself is a layer that manages adding and removing watches (aka creating and deleting backends) and processing events.

In the current design of vNext what you'd do is implement a Processor that does the recursion and another that does the filtering, and add them in when .recurse() or .filter(|f| !is_hidden(f)) are called on this hypothetical WatchDir builder. So I like this API design very much because it's exactly what vNext is made to be! (For more, see this draft of a presentation/announce.)

(The current state is that Processors are described and there's a trait, but they're not hooked into the Notify mechanic yet. I keep getting bogged down by async code. I anticipate refining/adjusting how the work/look once or while I get them working.)

passcod avatar Feb 01 '19 22:02 passcod

I've started working on the feature for v4, and I have a few questions.

  • I think ideally the filtering should in RecursiveMode something like:

    pub enum RecursiveMode {
        Recursive,
        NonRecursive,
        Filtered(...)
    }
    

    But that would be a breaking change, so I guess the best thing is to create a new method to create a watcher, new_with_filter maybe?

  • Should we filter files also? It would be a nicer api, but that would mean possibly turning rename events into create or delete if one of the names is filtered, that wouldn't be too bad for debounced events but I don't know how it should be handled for raw events. I was thinking of initially implementing directory filtering only. Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

  • In the current inotify implementation for recursive watching, when a directory is created it's walked recursively and the found directories are also watched. But shouldn't we also emit create events for all the found files and directories?

Regarding the v5, I think the backend trait is missing a way to configure the type of watch when creating it, specifically I'm thinking of creating a non recursive folder watch on windows for example. Also if I'm following the code correctly, for recursive watches with backends that don't support it nativelly, it will end up allocating Vec<PathBuf> each time a directory is created.

vemoo avatar Feb 08 '19 21:02 vemoo

vNext is not a viable development target at this time, so don't spend too much time on it. You bring a good point on the non-recursive-on-windows thing. More generally, the current trait offers no facility for passing options to backends, which has other usecases (e.g. event filtering in the kernel). I do expect that to be resolved by the time it stabilises, but for now the added complexity was not worth it. Similarly the core likely allocates way too much at the moment — in that specific case PathBuf is easier for prototyping, but I have thoughts around a descriptor enum to support using references, inodes, or handles directly, for example. There's many other such details and questions left... I'd like to get the general shape of the whole thing going before refining; or at least that's the idea.

I'll have a think about the rest.

I'm not opposed to another method but would it want a raw_with_filter() as well or would the filtering require the debounce?

Also wondering whether adding a variant to that enum would really be a breaking change... and if it is, whether it would be better to just do that, instead of a perhaps worse solution.

On Sat, 9 Feb 2019, 10:31 vemoo <[email protected] wrote:

I've started working on the feature for v4, and I have a few questions.

I think ideally the filtering should in RecursiveMode something like:

pub enum RecursiveMode { Recursive, NonRecursive, Filtered(...) }

But that would be a breaking change, so I guess the best thing is to create a new method to create a watcher, new_with_filter maybe?

Should we filter files also? It would be a nicer api, but that would mean possibly turning rename events into create or delete if one of the names is filtered, that wouldn't be too bad for debounced events but I don't know how it should be handled for raw events. I was thinking of initially implementing directory filtering only. Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

In the current inotify implementation for recursive watching, when a directory is created it's walked recursively and the found directories are also watched. But shouldn't we also emit create events for all the found files and directories?

Regarding the v5, I think the backend trait is missing a way to configure the type of watch when creating it, specifically I'm thinking of creating a non recursive folder watch on windows for example. Also if I'm following the code correctly, for recursive watches with backends that don't support it nativelly, it will end up allocating Vec<PathBuf> each time a directory is created.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-461953748, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgi3NajM-GDCkvhjj4L0hW4JwFoRKnks5vLey-gaJpZM4aUE6n .

passcod avatar Feb 09 '19 00:02 passcod

Can you explain what you mean by

Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

passcod avatar Feb 09 '19 05:02 passcod

  • What I mean by "wondering whether adding a variant to that enum would really be a breaking change" is that RecursiveMode is never returned from the API, so there's no reason to match on it. I think the exception to that would be a library that re-exports RecursiveMode to their users and then pattern-matches on it rather than passing it straight through to Notify.

    I'm weakly leaning more towards bumping the major anyway. I think a large new feature like this is worth it, especially if it means doing it right. Upgrading would be trivial, even for libraries as described above. Adoption would be slower, but the nicer API would be a good incentive.

  • Directory filtering at least as a first approach sounds good, as that would be most of the value.

  • shouldn't we also emit create events for all the found files and directories?

    I think that would make sense, but I wonder what others do:

    • What do our other backends do?
    • and what do other notify libraries in other languages do? (or in Rust, if applicable)

    I think this could be handled separately, though, so it doesn't block this issue.

passcod avatar Feb 09 '19 06:02 passcod

Can you explain what you mean by

Also to make the configuration more complete I was thinking of exposing a Fn(&Path)->WalkDir

So that the user could set the available options in WalkDir. But now that I think about it the only one that makes sense is follow_links, since min_depth and max_depth would only be usefull relative to the watch root, not each folder on which we well be using the WalkDir. So I'm thinking the filter could be:

struct RecursionFilter<P>
where
    P: Fn(&Path) -> bool,
{
    filter_dir: P,
    follow_links: bool,
}

vemoo avatar Feb 09 '19 16:02 vemoo

Right, sounds good.

passcod avatar Feb 09 '19 22:02 passcod

I think I found a way to do it here. I've created an abstraction for the watcher implementation: WatcherInternal and that way I can implement the recursive filtering and keep track of the which nested watches belong to which actual watch in RecursionAdapter.

It's mosly an idea, I haven't tried to implement it yet for any backends. I'll try inotify first, by implementing WatcherInternal for inotify::EventLoop, and see if it works.

vemoo avatar Feb 11 '19 21:02 vemoo

Random thought: in rust-analyzer, we've noticed that on mac we can get Crate event when the write is expected. That is, distinguishing between the kind of event reliably seems tricky. So perhaps instead of

        Event::Create => (),
        Event::Change => (),
        Event::Delete => (),

We should have

        Event::MaybeChanged => (),

?

matklad avatar Feb 13 '19 21:02 matklad

I have it mostly working for inotify (https://github.com/vemoo/notify/tree/watch-filter). There's 1 test (watch_recursive_move_out) that fails always and some other that fails ocasionally. I'll investigate.

I have yet to add tests for the filtering, but I think the hardest part is done. Adding it to the other watches should be easier since they don't have a recursive implementation that has to be replaced.

vemoo avatar Feb 17 '19 20:02 vemoo

This looks good, I like this.

On Mon, 18 Feb 2019 at 09:14, vemoo [email protected] wrote:

I have it mostly working for inotify ( https://github.com/vemoo/notify/tree/watch-filter). There's 1 test ( watch_recursive_move_out) that fails always and some other that fails ocasionally. I'll investigate.

I have yet to add tests for the filtering, but I think the hardest part is done. Adding it to the other watches should be easier since they don't have a recursive implementation that has to be replaced.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-464503283, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgi8Udamae3R-UWcZe1OM0Meheebd5ks5vObgggaJpZM4aUE6n .

passcod avatar Feb 18 '19 01:02 passcod

I'm slowly making progress.

The test that was failing was a timing issue, I fixed it here: https://github.com/vemoo/notify/commit/51d80bc1ed7f1a7ec88f8b50b009f832fa4f938d

I ended up implementing filtering for files also, because it wasn't that much more work. This is the filter struct: RecursionFilter and this is the item to filter on: FilterItem

To implement the filtering I needed to know if a rename event was the "from" or the "to" part, to try to add or remove watches. To keep it simple I check if the path exists and if it does I treat it as a rename "to", otherwise as a rename "from". But inotify and windows distinguish between the two events, so I though that it would be usefull to separate the RENAME op in two.

I also found something in Debounce that might be a bug. Given this raw events (filtered):

[
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
        CREATE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1",
        CLOSE_WRITE,
        None
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/non_ignored_dir",
        RENAME,
        Some(
            2384
        )
    ),
    (
        "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2",
        RENAME,
        Some(
            2386
        )
    )
]

the debounced events are:

[
    NoticeRemove(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/non_ignored_dir"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file2"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1/file1"
    ),
    Create(
        "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2"
    )
]

(this is the out put of running the tests https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L68 and https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L84 on linux) I think having separate "from" and "to" events will help fix this.

I've also started working on integrating it in windows, but it's a bit harder than I thought. For mac I won't be able to try it, unless there's a way to do it from linux.

vemoo avatar Feb 24 '19 19:02 vemoo

I should have my mac back from the 23rd onwards, so I'll be able to help thenre.

On Mon, 25 Feb 2019, 08:09 vemoo, [email protected] wrote:

I'm slowly making progress.

The test that was failing was a timing issue, I fixed it here: vemoo@ 51d80bc https://github.com/vemoo/notify/commit/51d80bc1ed7f1a7ec88f8b50b009f832fa4f938d

I ended up implementing filtering for files also, because it wasn't that much more work. This is the filter struct: RecursionFilter https://github.com/vemoo/notify/blob/watch-filter/src/recursion.rs#L30-L35 and this is the item to filter on: FilterItem https://github.com/vemoo/notify/blob/78c2feb6d9ce440b4ec0da9b1bb7214f3b7734a5/src/recursion.rs#L13-L18

To implement the filtering I needed to know if a rename event was the "from" or the "to" part, to try to add or remove watches. To keep it simple I check if the path exists and if it does I treat it as a rename "to", otherwise as a rename "from". But inotify and windows distinguish between the two events, so I though that it would be usefull to separate the RENAME op in two.

I also found something in Debounce that might be a bug. Given this raw events (filtered):

[ ( "/tmp/temp_dir.seaDBoBs6epi/dir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir1/file2", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1", CREATE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2/subdir1/file1", CLOSE_WRITE, None ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/non_ignored_dir", RENAME, Some( 2384 ) ), ( "/tmp/temp_dir.seaDBoBs6epi/dir1/subdir2", RENAME, Some( 2386 ) ) ]

the debounced events are:

[ NoticeRemove( "/tmp/temp_dir.6u6AFtuQQITu/dir1/non_ignored_dir" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir1/file2" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2/subdir1/file1" ), Create( "/tmp/temp_dir.6u6AFtuQQITu/dir1/subdir2" ) ]

(this is the out put of running the tests https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L68 and https://github.com/vemoo/notify/blob/b1890b91e07f8b7292b8de5bc7cb637a07141868/tests/recursion.rs#L84 on linux) I think having separate "from" and "to" events will help fix this.

I've also started working on integrating it in windows, but it's a bit harder than I thought. For mac I won't be able to try it, unless there's a way to do it from linux.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/175#issuecomment-466806460, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJgixE11O0NTrdT2RmxyXgEsCbTgj-fks5vQuNogaJpZM4aUE6n .

passcod avatar Mar 11 '19 05:03 passcod

It seem this approach doesn't work for windows. Because windows doesn't allow to delete the watched directory neither renaming the parent directory, and in this approach we create a non recursive watch for each directory. There are some tests that I didn't see until after I hit this issue: https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1117 https://github.com/vemoo/notify/blob/dc2e46b6bd4c3c9d3f9edaa6bcfc705b11c24137/tests/watcher.rs#L1461 that are ignored for windows for these reason.

vemoo avatar Mar 13 '19 19:03 vemoo