chokidar icon indicating copy to clipboard operation
chokidar copied to clipboard

inotify watcher tuning - per-directory+filter instead of per-file?

Open dreamlibrarian opened this issue 4 years ago • 4 comments

Describe the bug

Fileglobbed watches produce one watch per file under inotify; Under vscode, in large enough monorepos or with multiple repositories open this can rub up against hard OS limits that cannot be increased.

Versions (please complete the following information):

  • Chokidar version 3.5.1 (per current https://github.com/microsoft/vscode/blob/6922f6150d8e1886e7d597ae6998d02cfbe703e0/remote/package.json#L7)
  • Node version 12.18.3
  • OS version: Ubuntu 20.04

To Reproduce:

The current repro is through vscode; if you want me to generate a direct node invocation, let me know.

Clone a repository containing many files; a large monorepo project containing several hundred thousand files is our internal usecase.

Ensure CHOKIDAR_USEPOLLING is not set in the environment, and that the vscode settings usePolling and files.watcherExclude are at default values.

Open VSCode and open the repository folder using VSCode. VSCode will set up inotify watchers.

To view the resulting inotify watcher count, run the inotify-consumers script thoughtfully provided by fatso83. https://github.com/fatso83/dotfiles/blob/master/utils/scripts/inotify-consumers

Expected behavior inotify watches can watch a directory, which will produce events for all files in the directory.

In an ideal world, we could save ourselves hundreds of thousands of watches on large repositories by teaching chokidar to watch all directories, and use the globs specified as filters on which events to pass to the client. This would result in a higher volume of events, but a lower watch count, which would help avoid system limits on watches.

Additional context

BLUF: Wondering if it'd be practical to wrap fs.watch for globs, watch directories instead of files and glob-filter afterward, in order to minimize watcher count.

Our specific trouble environment is suffering from running multiple vscode environments as the same UID in the same user namespace - which is definitely something of a degenerate case. We’re working on resolving the issue from multiple angles, including asking about resource-utilization tuning here.

To make sure there weren’t other venues to follow, dug up the actual hard maximum for watches and verified it’s a per-uid setting: https://github.com/torvalds/linux/blob/7d6beb71da3cc033649d641e1e608713b8220290/fs/notify/inotify/inotify_user.c#L819.

VSCode filewatching leverages chokidar, passing it a complex glob of paths and suffixes to watch in order to only filter down the resulting events to sourcefiles.

Under fseventsd on Mac, this is highly efficient and handles cleanly.

On Linux, unfortunately, this results in a single inotify watch per file. You've noted the issue in the README; we're running into issues with multitenancy and monorepos hitting the per-user absolute max of ~1M watches.

I think there's a model for improved performance here, but I may be wrong.

Inotify supports watching directories instead of files, which would be a major resource utilization improvement; we'd need to filter the events that came from these watches afterwards to restrict to the globs.

What I don't know is if there's a practical downside that you've already discovered. Has this been tested before?

dreamlibrarian avatar Mar 10 '21 20:03 dreamlibrarian

what are you suggesting here? we've had this issue since like 2012. Other file watchers have the same issue.

paulmillr avatar Mar 11 '21 04:03 paulmillr

I mean like, how exactly so you propose to utilize inotify? We'd need a low-level native library for that.

paulmillr avatar Mar 11 '21 06:03 paulmillr

Thanks for the quick response and apologies for losing fidelity in the initial description.

fs.watch uses inotify as its underpinning implementation for nonpolling watches on linux; the problem is that as implemented it's somewhat naive and in the face of globs implements a watch per file.

To mitigate this, we'd need to wrap the fs.watch logic to generate watches-per-directory down the path specified (guessing we'd need to do our own walk of the tree) and then filter events on the actual exclude globs used.

I suspect it'd require writing a wrapper for the direct polling fs.watch and fs.watchFile calls: https://github.com/paulmillr/chokidar/blob/1322035c05939fa2c3c76aa39c3bb831b376d87d/lib/nodefs-handler.js#L119 https://github.com/paulmillr/chokidar/blob/1322035c05939fa2c3c76aa39c3bb831b376d87d/lib/nodefs-handler.js#L265

dreamlibrarian avatar Mar 11 '21 20:03 dreamlibrarian

Sounds interesting. Any optimizations that keep the overall functionality in existing projects are very welcome, of course!

paulmillr avatar Mar 12 '21 05:03 paulmillr