watchdog icon indicating copy to clipboard operation
watchdog copied to clipboard

Calling inotify_init() on each watch

Open Naatan opened this issue 11 years ago • 7 comments

I was getting the error OSError: inotify instance limit reached with only about 50 watchers open, which seemed weird. So I started digging and found that inotify_init() is called for pretty much each watcher: https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/inotify_c.py#L163

I verified this by adding a print statement and seeing it print out my message for each watch that was being added. I then tried the following workaround:

--- a/watchdog/src/watchdog/observers/inotify_c.py
+++ b/watchdog/src/watchdog/observers/inotify_c.py
@@ -158,12 +158,19 @@ class Inotify(object):
         ``True`` if subdirectories should be monitored; ``False`` otherwise.
     """

+    _inotify_fd = None
+
     def __init__(self, path, recursive=False, event_mask=WATCHDOG_ALL_EVENTS):
         # The file descriptor associated with the inotify instance.
-        inotify_fd = inotify_init()
-        if inotify_fd == -1:
-            Inotify._raise_error()
-        self._inotify_fd = inotify_fd
+        if not Inotify._inotify_fd:
+            inotify_fd = inotify_init()
+
+            if inotify_fd == -1:
+                Inotify._raise_error()
+
+            Inotify._inotify_fd = inotify_fd
+
+        self._inotify_fd = Inotify._inotify_fd
         self._lock = threading.Lock()

         # Stores the watch descriptor for a given path.

This seemed to work, in that watchers are added without also increasing the number of inotify instances beyond the one that is required. However this breaks the actual events as well as removing the watchers, giving errors such as

Exception in thread Thread-21:
Traceback (most recent call last):
  File "**/dist/python/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "**/dist/bin/python/komodo/watchdog/observers/inotify_buffer.py", line 57, in run
    inotify_events = self._inotify.read_events()
  File "**/dist/bin/python/komodo/watchdog/observers/inotify_c.py", line 302, in read_events
    wd_path = self._path_for_wd[wd]
KeyError: 4

I'm hesitant to dig much further at this time because being as I am not familiar with this codebase I do not fully understand the implications. It seems obvious to me though that adding an instance for each watch is not what you should be doing though, hopefully this can get fixed relatively easily but I'll have to defer to people who know this codebase thoroughly.

Naatan avatar Sep 26 '14 22:09 Naatan

I'm not the original author of that, but I believe the Inotify class was done like this in an attempt to create an object oriented wrapper around the inotify api. It encapsulate the whole api, that's why an inotify file descriptor is created for every instance. Apparently not good idea if you plan to create many of them. However, for me the inotify instance limit is set to 128. I don't know why you only got 50. Maybe you have other applications creating inotify instances..

The error you see in read_events is because if you simply share the file descriptor, every instance of Inotify will read from the same file descriptor (see inotify_c.py#L284) and receive events that does not belong to it.

To use only one file descriptor you will need to make the event reading static too, then distribute the events to the watch where it belongs somehow.

tamland avatar Sep 27 '14 11:09 tamland

Note the "whole Inotify api" consists of about 3 functions, so not sure how valid this object oriented approach is. Regardless this doesn't address the problem; watchdog is adding instances for each watcher; this is a bug no matter what design principle you apply to it.

If there is currently no active developer thoroughly familiar with this code I'd be happy to play further with it to try and come up with a proper fix, but I am mainly worried that this would require a ton of rewriting as it was obviously never designed to work with a single inotify instance (even though it should). Seems to me that this was implemented without a thorough understanding of how inotify works. Now I don't by any stretch claim to have a thorough understanding of inotify (on the contrary), but it seems obvious to me at this point that you should not be adding an inotify instance for each watcher.

Naatan avatar Sep 27 '14 16:09 Naatan

Besides the unfortunate low default limit I don't think there is anything that says you must only create one inotify instance. There's mismatch of APIs. watchdog is designed around watching separate trees, while inotify aground single directories or files. Feel free to try to suggest a solution. Alternatively if you don't mind managing inotify watches yourself, implement an EventEmitter that don't create multiple inotify instances, but you'll lose the ability to have different event handles for different trees.

tamland avatar Oct 29 '14 12:10 tamland

Despite the age of this issue, it is still alive and well. I was bitten by it today. :)

In my use case, another part of the code decides which directories are worth watching. I just need a way to regularly add or remove directories (possibly hundreds) from the ones being watched, and the recursive option is not even desirable. (This is different from #212.)

Looking at the inotify code in watchdog, it does indeed need a good refactoring to fix this issue. I know it's a long shot, but I'll try to suggest a very tentative plan to solve the issue:

  1. #1033 makes lots of changes, practically in all files, so it needs to be merged before a pull request can be made.
  2. A class INotifyFD can be added to hold the result of the inotify_init(). An instance of INotifyFD can be created in the InotifyObserver class, to make sure there are few of them by default. The INotifyFD also reads events from the low-level buffer and has basic support for adding and removing watched paths. (It adds almost no functionality above the low-level code.)
  3. Because the IN_MOVED_FROM and IN_MOVED_TO events need to be paired, as currently done by the InotifyBuffer. It seems logical to implement the pairing in the new INotifyFD class, so that it happens before the events get distributed to the different emitters.
  4. Each new InotifyEmitter instance created by InotifyObserver receives the same INotifyFD instance and registers a callback to receive records from the low-level event buffer. (This can be done by passing a factory function in the emitter_class argument of the BaseObserver constructor.)
  5. To support recursive watching, there should be an additional layer between INotifyFD and InotifyEmitter. For example, each InotifyEmitter can create an InotifyFilter helper to take care of filtering out the relevant paths from the INotifyFD events, and it can optionally transparently add/remove watches for subdirectories. (InotifyFilter can replace most of the logic that is currently implemented in the INotify class.)

@gorakhargosh @BoboTiG @tamland Would this plan make sense? There are obviously still a lot of details to be filled in.

tovrstra avatar Jun 13 '24 20:06 tovrstra

I'm currently trying to implement @tovrstra's plan and stumbled upon one remaining problem: When there are multiple watches that watch the same file/folder, but have different masks, only one mask will be used.

According to the man page for inotify_add_watch:

If the filesystem object was already being watched (perhaps via a different link to the same object), then the descriptor for the existing watch is returned.

This is not a problem with the current implementation as each watch gets its own inotify instance.

I can se 3 possible solutions:

  • Do not support differing masks for the same folder/file. this case and generate a warning whenever this occurs. This will probably lead to some broken applications.
  • Do support differing masks for the same folder/file by using multiple INotifyFD instances. This will complicate the code quite a bit.
  • Do support differing masks for the same folder/file by filtering events in python. This will also complicate the code quite a bit and might have a performance impact(?).

@gorakhargosh @BoboTiG @tamland @tovrstra What do you think?

JoachimCoenen avatar Feb 15 '25 16:02 JoachimCoenen

@JoachimCoenen That's not a small problem to tackle, as you probably know all too well. I hope @BoboTiG can comment so you can get some feedback on your plans before you do all the coding. It would be a shame to get into discussions after spending a lot of time on it. That's why I didn't start the effort in early 2024. Later, I never had a chance to pick it up again after the "help wanted" tag was added, sorry.

From an application development perspective, your first option, i.e. having the same mask for a single observer, would be acceptable to me. You can still create a second observer with a different mask if different parts of a program have different requirements.

When I analyzed the inotify part of watchdog, I got the impression that there was a lot of redundancy, as if it was borrowed from another library and then not all the redundant logic from that other implementation was removed. Since you're heading into a major refactoring, you may want to trim things down a bit.

I'm happy to help review and test any pull requests, just ping me.

tovrstra avatar Feb 16 '25 08:02 tovrstra

By the way, I switched to asyncinotify for my project to avoid this issue with watchdog. asyncinotify has a very clean implementation, but is by construction not cross-platform. It may be a useful source of inspiration.

tovrstra avatar Feb 16 '25 08:02 tovrstra