Calling inotify_init() on each watch
I was getting the error OSError: inotify instance limit reached with only about 50 watchers open, which seemed weird. So I started digging and found that inotify_init() is called for pretty much each watcher: https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/inotify_c.py#L163
I verified this by adding a print statement and seeing it print out my message for each watch that was being added. I then tried the following workaround:
--- a/watchdog/src/watchdog/observers/inotify_c.py
+++ b/watchdog/src/watchdog/observers/inotify_c.py
@@ -158,12 +158,19 @@ class Inotify(object):
``True`` if subdirectories should be monitored; ``False`` otherwise.
"""
+ _inotify_fd = None
+
def __init__(self, path, recursive=False, event_mask=WATCHDOG_ALL_EVENTS):
# The file descriptor associated with the inotify instance.
- inotify_fd = inotify_init()
- if inotify_fd == -1:
- Inotify._raise_error()
- self._inotify_fd = inotify_fd
+ if not Inotify._inotify_fd:
+ inotify_fd = inotify_init()
+
+ if inotify_fd == -1:
+ Inotify._raise_error()
+
+ Inotify._inotify_fd = inotify_fd
+
+ self._inotify_fd = Inotify._inotify_fd
self._lock = threading.Lock()
# Stores the watch descriptor for a given path.
This seemed to work, in that watchers are added without also increasing the number of inotify instances beyond the one that is required. However this breaks the actual events as well as removing the watchers, giving errors such as
Exception in thread Thread-21:
Traceback (most recent call last):
File "**/dist/python/lib/python2.7/threading.py", line 808, in __bootstrap_inner
self.run()
File "**/dist/bin/python/komodo/watchdog/observers/inotify_buffer.py", line 57, in run
inotify_events = self._inotify.read_events()
File "**/dist/bin/python/komodo/watchdog/observers/inotify_c.py", line 302, in read_events
wd_path = self._path_for_wd[wd]
KeyError: 4
I'm hesitant to dig much further at this time because being as I am not familiar with this codebase I do not fully understand the implications. It seems obvious to me though that adding an instance for each watch is not what you should be doing though, hopefully this can get fixed relatively easily but I'll have to defer to people who know this codebase thoroughly.
I'm not the original author of that, but I believe the Inotify class was done like this in an attempt to create an object oriented wrapper around the inotify api. It encapsulate the whole api, that's why an inotify file descriptor is created for every instance. Apparently not good idea if you plan to create many of them. However, for me the inotify instance limit is set to 128. I don't know why you only got 50. Maybe you have other applications creating inotify instances..
The error you see in read_events is because if you simply share the file descriptor, every instance of Inotify will read from the same file descriptor (see inotify_c.py#L284) and receive events that does not belong to it.
To use only one file descriptor you will need to make the event reading static too, then distribute the events to the watch where it belongs somehow.
Note the "whole Inotify api" consists of about 3 functions, so not sure how valid this object oriented approach is. Regardless this doesn't address the problem; watchdog is adding instances for each watcher; this is a bug no matter what design principle you apply to it.
If there is currently no active developer thoroughly familiar with this code I'd be happy to play further with it to try and come up with a proper fix, but I am mainly worried that this would require a ton of rewriting as it was obviously never designed to work with a single inotify instance (even though it should). Seems to me that this was implemented without a thorough understanding of how inotify works. Now I don't by any stretch claim to have a thorough understanding of inotify (on the contrary), but it seems obvious to me at this point that you should not be adding an inotify instance for each watcher.
Besides the unfortunate low default limit I don't think there is anything that says you must only create one inotify instance. There's mismatch of APIs. watchdog is designed around watching separate trees, while inotify aground single directories or files. Feel free to try to suggest a solution. Alternatively if you don't mind managing inotify watches yourself, implement an EventEmitter that don't create multiple inotify instances, but you'll lose the ability to have different event handles for different trees.
Despite the age of this issue, it is still alive and well. I was bitten by it today. :)
In my use case, another part of the code decides which directories are worth watching. I just need a way to regularly add or remove directories (possibly hundreds) from the ones being watched, and the recursive option is not even desirable. (This is different from #212.)
Looking at the inotify code in watchdog, it does indeed need a good refactoring to fix this issue. I know it's a long shot, but I'll try to suggest a very tentative plan to solve the issue:
- #1033 makes lots of changes, practically in all files, so it needs to be merged before a pull request can be made.
- A class
INotifyFDcan be added to hold the result of theinotify_init(). An instance ofINotifyFDcan be created in theInotifyObserverclass, to make sure there are few of them by default. TheINotifyFDalso reads events from the low-level buffer and has basic support for adding and removing watched paths. (It adds almost no functionality above the low-level code.) - Because the
IN_MOVED_FROMandIN_MOVED_TOevents need to be paired, as currently done by theInotifyBuffer. It seems logical to implement the pairing in the newINotifyFDclass, so that it happens before the events get distributed to the different emitters. - Each new
InotifyEmitterinstance created byInotifyObserverreceives the sameINotifyFDinstance and registers a callback to receive records from the low-level event buffer. (This can be done by passing a factory function in theemitter_classargument of theBaseObserverconstructor.) - To support recursive watching, there should be an additional layer between
INotifyFDandInotifyEmitter. For example, eachInotifyEmittercan create anInotifyFilterhelper to take care of filtering out the relevant paths from theINotifyFDevents, and it can optionally transparently add/remove watches for subdirectories. (InotifyFiltercan replace most of the logic that is currently implemented in theINotifyclass.)
@gorakhargosh @BoboTiG @tamland Would this plan make sense? There are obviously still a lot of details to be filled in.
I'm currently trying to implement @tovrstra's plan and stumbled upon one remaining problem: When there are multiple watches that watch the same file/folder, but have different masks, only one mask will be used.
According to the man page for inotify_add_watch:
If the filesystem object was already being watched (perhaps via a different link to the same object), then the descriptor for the existing watch is returned.
This is not a problem with the current implementation as each watch gets its own inotify instance.
I can se 3 possible solutions:
- Do not support differing masks for the same folder/file. this case and generate a warning whenever this occurs. This will probably lead to some broken applications.
- Do support differing masks for the same folder/file by using multiple
INotifyFDinstances. This will complicate the code quite a bit. - Do support differing masks for the same folder/file by filtering events in python. This will also complicate the code quite a bit and might have a performance impact(?).
@gorakhargosh @BoboTiG @tamland @tovrstra What do you think?
@JoachimCoenen That's not a small problem to tackle, as you probably know all too well. I hope @BoboTiG can comment so you can get some feedback on your plans before you do all the coding. It would be a shame to get into discussions after spending a lot of time on it. That's why I didn't start the effort in early 2024. Later, I never had a chance to pick it up again after the "help wanted" tag was added, sorry.
From an application development perspective, your first option, i.e. having the same mask for a single observer, would be acceptable to me. You can still create a second observer with a different mask if different parts of a program have different requirements.
When I analyzed the inotify part of watchdog, I got the impression that there was a lot of redundancy, as if it was borrowed from another library and then not all the redundant logic from that other implementation was removed. Since you're heading into a major refactoring, you may want to trim things down a bit.
I'm happy to help review and test any pull requests, just ping me.
By the way, I switched to asyncinotify for my project to avoid this issue with watchdog. asyncinotify has a very clean implementation, but is by construction not cross-platform. It may be a useful source of inspiration.