watchdog
watchdog copied to clipboard
Make a distinction between content and metadata changes
Maybe I am doing something wrong... but I have an use case, in which I only care about file content ... I don't want to trigger the handler when file attributes or file security changes.
I know that in other cases, those informations are very important, but since both Inotify and readdirectorychanges provides the options to filter events from the source, maybe it would make sense to add some configuration in watchdog.
In the same time, directory poller, only raise changes for file content and not metadata changes.
Maybe this API would help and make sense
# everything... just like before
observer.subscribe()
#only content
observer.subscribeContent()
#only metadata
observer.subscribeMetadata()
when calling observer.subscribeMetadata() on directory poller, we can either raise an error that this is not supported, or create a special directory snapshot which polls for security data.
What do you think?
I think it's better to split the 'modified' event into two events. Has to be investigated if this can be implemented on all platforms first.
Filtering at a lower level should improve performance as the OS will send less events and watchdog will have to filter/ignore fewer events.
Maybe there is a way to group events and then pass the groups to subscribe
def subscribe(events=ALL):
pass
observer.subscribe(CONTENT | METADATA)
Why do you treat both contents and attribute change the same way?
https://github.com/gorakhargosh/watchdog/blob/5efb90eaaeed9a782fd13393e050517aa12fb551/src/watchdog/observers/inotify.py#L157-L162
Should I submit a pull request and fix this?
@SamSchott
@GreatBahram yes go ahead. Introduce new events for attributes changes. I guess such change must be done on all platforms.
Thanks, I'll fix it then.
I like the idea! Though it may be difficult to implement for the Windows and Polling backends:
Polling observer: we cannot rely on the mtime to detect file modifications (even though we currently do...) since it can be changed by the user. However, the ctime is incremented whenever the content or metadata changes, making it impossible to distinguish between the two without reverting to something like content hashes.
Windows: According to the docs here, events that are written to the buffer by ReadDirectoryChangesW
don't seem to distinguish between metadata and content modification.
If this cannot be implemented on all backends, I am wondering if it still makes sense to implement for inotify and fsevents. As it stands, there already are many (undocumented) differences between backends and I would rather strive toward uniform behaviour, even if it means that the API is limited by the least versatile backend. But of course this is not my decision to make. @BoboTiG, what do you think?
I am 100% OK with you @SamSchott. I was hoping we could have universal new events. If it is not possible, I would rather not make APIs more and more divergent.
After consideration, WDYT of simply adding FileModifiedEvent.attributes_only
and DirModifiedEvent.attributes_only
attrs? It would not change the default behavior and still allowing one to use them.
I am thinking of that because it is useful to be able to filter on such changes.
It makes me think of the IN_CLOSE_WRITE
and its related FileClosedEvent
event: it is only for Linux, and at the same time it is such a feature that it is interesting to have it, even if other backends will never fire such event. If the change is worth, I would say let's do this.
I think it's better to expose these features, close_write
and attrib
events even if it is only available on Linux, and let users themselves decide to use these functionalities or not.
For instance, it is costly not to be able to differentiate between attribute change and file modification, and I have to upload a big file instead of just updating some metadata. Using a simple attribute such as attributes_only
is a great idea as it only returns True only on Linux.
What's the final decision here?
@BoboTiG makes the calls, I am merely an opinionated contributor :)
I do agree that it would be a great feature to have and I'm not saying that's impossible on Windows (I have very little experience there). I just have a slight preference towards a uniform API, or at last a clear documentation of the differences between backends. But documentation is another issue by itself...
@GreatBahram go with attributes_only
attributes. Let's hope you can figure out how to do it for all backends :)
Hello @BoboTiG,
I've read the source code and made some changes to support the attrib event. And I realized with the current implementation; it is possible to have the attrib event on all platforms except for windows which I'm still struggling with...
And I was able to test it on Linux (notify) and FreeBSD (kqueue), but I couldn't find any way to run mac os on a virtual machine, and I'm wondering is there any simple way to do this? If you have any ideas, please let me know. There were two fsevents modules, and I searched through Github issues and spotted some issues about making fsevents2 the default observer on mac os. So I only add the changes to the fsevents2
module while it is possible to do the same on fsevents too.
- One design question here:
After playing with the changes a little bit, I thought it is better to have two new events and a simple
on_attrib
method to support this functionality instead of what we discussed here because I believe it prevents developers from if-else like coding. We could implement it this way:
class FileAttribEvent(FileSystemEvent):
"""
File system event representing file metadata modification on the file system.
"""
event_type = EVENT_TYPE_ATTRIB
class DirAttribEvent(FileSystemEvent):
"""
File system event representing directory metadata modification on the file system.
"""
event_type = EVENT_TYPE_ATTRIB
is_directory = True
class FileSystemHandler:
...
def on_attrib(self, event):
"""Called when a file or directory metadata is modified.
:param event:
Event representing file/directory metadata modification.
:type event:
:class:`FileAttribEvent` or :class:`DirAttribEvent`
"""
class InotifyEmitter:
....
elif event.is_attrib:
cls = DirAttribEvent if event.is_directory else FileAttribEvent
self.queue_event(cls(src_path))
class KqueueEmitter:
....
elif is_attrib_modified(kev):
if descriptor.is_directory:
yield DirAttribEvent(src_path)
else:
yield FileAttribEvent(src_path)
clsss FSEventsEmitter:
....
elif event.is_inode_meta_mod or event.is_xattr_mod:
cls = DirAttribEvent if event.is_directory else FileAttribEvent
self.queue_event(cls(event.path))
And I've also updated the LoggingEventHandler
to support to_attrib method:
def on_attrib(self, event):
super().on_attrib(event)
what = 'directory' if event.is_directory else 'file'
self.logger.info("Attrib %s: %s", what, event.src_path)
I know it might seem like breaking the uniformity across the project, but I think the community will accept this feature because it is not a big deal and make sense to have a separate function for this event. At the same time, there is no burden on windows developers because even if we cannot support this feature, using ReadDirectoryChangesW
, there is gonna be no breakage on their source code.
Please let me know if you think this is ok or should I stick with the attributes_only
attribute.
And Do you have any idea how I should test the fsevents module while I don't have access to a mac os computer? I couldn't even find a proper virtual machine image for it...
Thanks
I've checked the windows library itself, and also this go library which is a cross-platform, and it seems to me it is not possible to get attrib event inside the windows platform. If you have any points or comment on this, please let me know.
Finally, If you agree on the implementation, I want to add some test cases to cover these new notifications.
Nice work! Let's move forward with your idea :)
It will be a breaking patch, so we will just bump the major version number.
Couple of notes:
- Add changes to both
fsevent
ansfsevents2
. - You can run macOS tests by just pushing changes to your PR: tests will be run on all Oses.
- Add tests and fix current failures.
- Let me know on the PR when you are ready for a complete review.
And a big thank you for your time :)
For Windows, it might not be possible to differentiate attribute and content changes, but it IS possible to filter which events you want. I'm getting "change" events for a directory of images, that I'm pretty sure are actually just the "Last Accessed" date being updated on read.
https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-readdirectorychangesw
Throwing together a quick C# test shows that with the default "everything" filter, I get the same kind of behaviour as watchdog, but with LastAccess removed from the filter, I get what I really wanted.
So one could theoretically start two watches, one that only picks up content modified events and another that only picks up metadata changes. It will be a bit more resource heavy but we would maintain a uniform API across backends.
That's what I wasn't sure about - I assume it's a bit less efficient to do it that way. Although the differentiation I'm looking in my application (basically a grunt watch
type thing for image processing) for is not quite metadata vs content - it's that LastAccess change specifically that trips me up. Maybe an exclude option? (but then making that uniform becomes the issue)
Providing an exclude option across all backends should be easy, you can either filter events as they come in or directly subscribe to only some types of events on pretty much all backends. However, it would go somewhat against the current API design where all events are always emitted and the event handler then decides how to process different types of events.
When I say "metadata" I mean anything that does not affect the location or the binary content of the file. For example, looking at the list of provided notifications in Windows, that would be CHANGE_ATTRIBUTES
, CHANGE_LAST_ACCESS
, CHANGE_SECURITY
. In macOS, this would mean is_inode_meta_mod
and is_xattr_mod
. Would that cover your use case?
Is that actually the possible events in Windows? It seems to just be Modified, regardless of what changed. (looking at watchdog.observers.winapi.WinAPINativeEvent
). Unless there is something deeper in there that's aggregating multiple event types.
My current (working) monkey-patch for this is:
watchdog.observers.winapi.WATCHDOG_FILE_NOTIFY_FLAGS = reduce(
lambda x, y: x | y, [
FILE_NOTIFY_CHANGE_FILE_NAME,
FILE_NOTIFY_CHANGE_DIR_NAME,
FILE_NOTIFY_CHANGE_ATTRIBUTES,
FILE_NOTIFY_CHANGE_SIZE,
FILE_NOTIFY_CHANGE_LAST_WRITE,
FILE_NOTIFY_CHANGE_SECURITY,
# FILE_NOTIFY_CHANGE_LAST_ACCESS,
FILE_NOTIFY_CHANGE_CREATION,
])
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
which gets me to where I want to be, although obviously not very elegantly.
Indeed, the emitted event type would always be "modified". I was rather proposing a change to the Windows backend to start two watches:
- First watch sets all
NOTIFY_FLAGS
except forCHANGE_ATTRIBUTES
,CHANGE_LAST_ACCESS
,CHANGE_SECURITY
. Reading carefully through the docs, it appears thatCHANGE_CREATION
should also be excluded because it signifies only a change in the creation time stamp. The actual file creation would be recorded when setting the flagCHANGE_FILE_NAME
. - Second watch sets only the flags which are not set for the first watch.
That way, we can ensure that any "modified" events from the first watch are content changes and all "modified" events from the second watch are attribute / metadata changes.
Sorry - misunderstood which layer you were talking about. Yes, that would work. Although even then I think the option to filter LAST_ACCESS (or others) from the second is useful, in lieu of a way to know what event it was. My use case is a file server - read accesses are completely normal and uninteresting, and not being able to tell them apart from permissions changes is unhelpful.
That comes back to API design then. We could either:
- Tell the
Observer
which types of events to record in the first place, possibly with a granularity that is finer than the actual subclasses ofFileSystemEvent
. It sounds like this is what you are proposing. - Record all events and provide different subclasses
FileMtimeModifiedEvent
,FilePermissionModifiedEvent
,FileOpenEvent
, etc, down to the smallest granularity that is useful and supported by the backend.
The watchdog API currently tries to implement the second approach and this issue makes a strong case that the provided granularity is insufficient for some use cases.
There can of course still be a discussion about API design, for example providing arguments to the Observer to suppress certain types of events before they are passed on to the Handler. This could marginally improve performance in some cases or reduce the number of required watches in Windows. However, IMO, we should still provide different subclasses of FileSystemEvent
for every type of information that is useful to separate (such as FileOpenEvent
).
Is there a way in Watchdog 2.1.2 to identify xattr only modify events and filter them out ?
Currently I am stuck with v0.10.3 which doesn't generate modify event for metadata changes
for the wandering googler, if your platform has stat, then its trivial to get around this issue by checking if the mtime has changed. such a solution would persist even after #800 has merged.