mdBook Use PollWatcher to always get filesystem updates

See https://github.com/rust-lang/mdBook/issues/2102#issuecomment-1810982299 for explanation on this

Closes #2102, #2035, #383 and #1441

Nov 18 '23 10:11 KFearsoff

I vaguely recall trying the PollWatcher and having issues with it, but I can't remember what they were. I think it was mostly performance problems.

One issue is that with this change, mdBook always prints an error when it starts (assuming you don't have a theme directory):

2023-11-24 12:58:56 [WARN] (mdbook::cmd::watch): error while watching for changes: No such file or directory (os error 2) about ["/Users/eric/Temp/z30/theme"]

IIUC, the way this works is that every second it scans all the files, and checks if any mtime changes are there. Is that correct? I'm not sure I understand why this would need a debouncer if it scans every second, and the debouncer is removing duplicates once a second. How do the two of those interact?

For larger books, what is the expected performance concerns? What if the source is over a network drive?

cc @0xpr03, do you perhaps have some guidance on our usage of notify and using a PollWatcher?

Nov 24 '23 21:11 ehuss

One issue is that with this change, mdBook always prints an error when it starts (assuming you don't have a theme directory):

Makes sense. Shouldn't be hard to fix.

IIUC, the way this works is that every second it scans all the files, and checks if any mtime changes are there. Is that correct?

Yes! That's completely right.

I'm not sure I understand why this would need a debouncer if it scans every second, and the debouncer is removing duplicates once a second. How do the two of those interact?

Truth be told, I can't tell you this, because I simply don't know. Upstream provides an example of using debouncer with PollWatcher. I'm not sure if that makes sense, but my handwavy assumption is that debouncer may provide us with some safeguards against really weird edge cases like filesystem being extremely unresponsive for a second and then submitting two events simultaneously.

For larger books, what is the expected performance concerns?

Hard to say without benchmarking it. Alacritty seems to use 1 second, cargo-watch uses 5 seconds. Unfortunately, there doesn't seem to be much information about the performance hit.

What if the source is over a network drive?

Actually, it seems like inotify doesn't support NFS anyway: https://github.com/notify-rs/notify/issues/64#issuecomment-216372446

NFS isn't the only network filesystem, of course, but I would cautiously assume it to be the most popular one that can be used with mdbook.

Regarding large books and network drives in particular: I think it's a good idea to evaluate Kubernetes services for how they do live reloading. Overall, I've seen two approaches:

Don't do any kind of filesystem watching at all, and just reload on SIGHUP or HTTP POST request on /-/reload endpoint
Give up on live reloading completely and just re-create the container (often the case for static content, which is what we have)

Overall, I think supporting live reload on demand is a good idea. I'm not sure how high of a priority it has, considering that we haven't mapped out the user story for live reloading (as in, we don't know who actually needs this feature, what workarounds they use right now, and we don't even have a container image which is a prerequisite to talking with people who care about live reload). On a completely unrelated note: even if we don't support SIGHUP right now, there's a good reason to consider it, because we don't support SIGTERM either, which means containers end up getting SIGKILL'ed anyway because we don't have a graceful shutdown.

Nov 24 '23 22:11 KFearsoff

There is no reason to use the debouncer with pollwatcher, unless you have timings that could make this work. Example would be to poll every 1 second and debounce every 2. So you can detect re-created files (depends on the editor of your choice), instead of just rendering nothing because in that second it was still gone.

It may also be useful in case you swap out debouncer backends depending on what your underlying filesystem is.

For bigger books it could be very resource intense. Though I do expect the OS to already have all of the relevant files cached from previous re-renders. On the flip side inotify can use up too many file handles in big workspaces, as it requires one file handle per watched node (folder,file). Which is not the case on windows, so those users won't see that problem.

As you said: It really depends on what your users actually expect from the live-reload. For instant and low-resource, low-idle live reloading, the pollwatcher might be a bad idea. But the moment someone comes around with networked files on linux (WSL -> Windows access), you've lost.

This ultimately just shows the need to get some kind of self-driven selection system into the full-debouncer, which somehow detects the underlying filesystem. Eventually, when I have time for that..

The easiest choice would be to allow both: Some flag so users can switch between native backends and the pollwatcher. Whether that still requires the debouncer I can't say. I definitely would use it for the native backends, too much chatter otherwise.

I hope I didn't confuse you even more.

Nov 25 '23 00:11 0xpr03

That's a great write-up!

I looked into what the world has to offer a little bit more, apparently tsx also has this issue, and it currently doesn't expose the command-line flag: https://github.com/privatenumber/tsx/issues/266

From my experience, there are two kinds of live-reloading: on-demand (for services e.g. in the cloud) and watching the filesystem directly (for dev environments). Generally, those are exposed by different commands: for on-demand, you have build, release and stuff, and for watching you use dev, serve. mdbook is a little bit weird in that it doesn't emphasize/provide a correct way to use the software in production, so people end up using mdbook serve for Docker containers (which doesn't really work without PollWatcher) and for local development.

There are a lot of things that can be done here, and I guess it ultimately comes down to user experience. I think I'll go with providing a flag for now.

Nov 27 '23 12:11 KFearsoff

I tried implementing a flag today, and failed miserably. I haven't figured out a way do provide new_debouncer_opt with a notifier that can be of either Watcher type, I'll have a look at it later too, but so far it seems impossible with current Rust features. I may be very wrong though!

Dec 04 '23 23:12 KFearsoff

Note: #2030 is ready to merge, and its live-reload functionality depends on this PR

Dec 19 '23 21:12 KFearsoff

Since I've been having a lot of problems lately with the watcher, I have opened #2325 which includes an option to set the backend, and uses a simple scanner which can handle pruning via gitignore and handles watching more things (like additional-css/js, adding/removing theme, etc.).

@KFearsoff Do you think you could check out that PR and try it out and see if that helps for any scenarios you were running into?

Feb 24 '24 21:02 ehuss

I'm going to close in favor of #2325. Thanks @KFearsoff!

May 13 '24 20:05 ehuss