jupyter_server
jupyter_server copied to clipboard
Add file system change notifier
I verified that watchfiles is available on conda-forge and has an MIT license.
Codecov Report
Merging #783 (029de4f) into main (02e5f3a) will decrease coverage by
0.37%. The diff coverage is87.50%.
:exclamation: Current head 029de4f differs from pull request most recent head 7bd9f99. Consider uploading reports for the commit 7bd9f99 to get more accurate results
@@ Coverage Diff @@
## main #783 +/- ##
==========================================
- Coverage 70.29% 69.91% -0.38%
==========================================
Files 62 62
Lines 7554 7383 -171
Branches 1248 1225 -23
==========================================
- Hits 5310 5162 -148
+ Misses 1861 1848 -13
+ Partials 383 373 -10
| Impacted Files | Coverage Δ | |
|---|---|---|
| jupyter_server/services/contents/manager.py | 82.57% <75.00%> (-0.15%) |
:arrow_down: |
| jupyter_server/services/contents/filemanager.py | 73.17% <100.00%> (+0.38%) |
:arrow_up: |
| jupyter_server/terminal/api_handlers.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| jupyter_server/terminal/terminalmanager.py | 0.00% <0.00%> (-88.47%) |
:arrow_down: |
| jupyter_server/terminal/__init__.py | 0.00% <0.00%> (-79.17%) |
:arrow_down: |
| jupyter_server/terminal/handlers.py | 0.00% <0.00%> (-75.00%) |
:arrow_down: |
| jupyter_server/services/kernels/handlers.py | 57.44% <0.00%> (-1.07%) |
:arrow_down: |
| jupyter_server/utils.py | 61.38% <0.00%> (-1.00%) |
:arrow_down: |
| jupyter_server/extension/manager.py | 92.75% <0.00%> (-0.85%) |
:arrow_down: |
| jupyter_server/pytest_plugin.py | 83.80% <0.00%> (-0.37%) |
:arrow_down: |
| ... and 7 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 02e5f3a...7bd9f99. Read the comment docs.
I verified that watchfiles is available on conda-forge
I packaged it yesterday :smile:
I was about to ask how it compares with watchgod - but actually that is the new name :smile:
Indeed, but I think now almost everything is done in Rust using this library.
I was about to ask how it compares with watchgod - but actually that is the new name
We use watchdog in EG and Elyra for cache consistency implementations. I'm guessing watchgod (watchfiles) came afterward (based on the play on the name), and I'm curious if there's something that should be avoided about watchdog? Since both EG and Elyra depend on jupyter_server, I'd rather not require our users to install two packages that essentially do the same thing. I'm fine converting the watchdog instances, but would like to better understand why watchfiles is advantageous.
I verified that watchfiles is available on conda-forge and has an MIT license.
Hmm. I see watchdog uses an Apache license. Is this difference a licensing thing? If so, are there also functionality differences that warrant the use of watchfiles over watchdog?
Apache is fine, I was mainly checking for not GPL
@kevin-bates I had a previous experience with watchdog, my conclusion was that it is much better to work with watchgod (and thus watchfiles) in an async context. Also, watchfiles has some nice features such as grouping events, I'm not sure if watchdog has the equivalent.
The question here is: if we are importing watchfiles (in the Python meaning), does it mean we are also importing its API as a specification that other contents managers should be compliant with?
In JupyterLab I want to check if the contents manager supports watching file changes. For that I need to know if the underlying file system is the local file system:
if isinstance(self.contents_manager, (FileManagerMixin, AsyncFileManagerMixin)):
But this is not ideal since custom contents managers could target the local file system and not derive from FileManagerMixin or AsyncFileManagerMixin.
I think the ContentsManager class should have a watchfiles property returning None by default, that implementations should override if they can offer this service. For instance, our file managers would have:
@property
def watchfiles(self):
import watchfiles
return watchfiles.__all__
# ('watch', 'awatch', 'run_process', 'arun_process', 'Change', 'BaseFilter', 'DefaultFilter', 'PythonFilter', 'VERSION')
Then, the user could inspect the objects returned by this property to know what is supported. This makes the file system change notifyer a "loose specification" in the contents manager, meaning that it tries to follow watchfiles' API when possible. An alternative solution would be to fully specify the API in the contents manager, maybe by copying a subset of watchfiles' API, but it seems less flexible. What do you think?
In this last commit I added the watchfiles attribute in the contents manager class. In ContentsManager it is implemented as an instance of an empty class, and in FileContentsManager and AsyncFileContentsManager it is the watchfiles module.
In this last commit I added the watchfiles attribute in the contents manager class
I am wondering what does that mean in terms of jupyter-server semantic versioning. I guess it means jupyter-server will have to follow watchfiles if it adopts its API? I suppose we don't want to be in a situation where watchfile don't follow semver and breaks on a patch release?
If the watchfiles API is not too complicated, could the ContentManager just implement the methods it provides and privately make use of watchfiles?
Yes that's what I meant with "loose specification". On the other hand, even if we implement the methods in ContentsManager and under the hood delegate to watchfiles, we still have no control over changes e.g. in the return values, etc. Or we would have to exactly pin the package, and manually sync with new releases. But maybe it is safer?
Thanks for working on this @davidbrochart. I agree with @martinRenou and believe we should introduce some form of abstraction that allows folks the ability to implement their own "watchfiles" integration (in their own ContentsManager). We should also not preclude custom contents managers from wanting to watch remote files either - so filesystem locality should not be relevant - and boils down to does the configured ContentsManager support file-change notification or not?
@kevin-bates @martinRenou @trungleduc I added a BaseWatchfiles class, that is currently populated with the following attributes (links to the corresponding watchfiles API):
I believe this is the minimum necessary for now (at least for my use-case :smile: ). We can add more in the future if needed. I also pinned watchfiles==0.13, so that this is a stable API in our contents manager implementations.
Isn't this just exposing the watchfiles API? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implement watchfiles.
What do we need in addition to the file and its corresponding action: added, modified, or deleted?
Are we going to allow subsets of src_root to be monitored or is it all or nothing under src_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified - *.ipynb, *.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:
c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]
or, if only src_root, just let the configurable be the filter...
c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]
and non-None values indicate that "watch files" is enabled.
Isn't this just exposing the
watchfilesAPI? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implementwatchfiles.
I think we should adhere to watchfiles' API, but that doesn't mean that all implementations must implement it. Our file contents managers implement all of it, and directly delegates to watchfiles. But other implementations can implement a subset of it. But what they do implement, should conform to the API.
What do we need in addition to the file and its corresponding action: added, modified, or deleted?
Are we going to allow subsets of
src_rootto be monitored or is it all or nothing undersrc_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified -*.ipynb,*.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]or, if only
src_root, just let the configurable be the filter...c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]and non-None values indicate that "watch files" is enabled.
I think all your suggestions can be implemented with the watch_filter argument of watch().
That is why I think we should adhere to watchfiles' API, there is no need to reinvent the wheel. It is a very well designed library and I doubt we would come up with a better API. And anyway, if we want to use it under the hood at least for our file contents managers, having another layer on top of its original API will probably make things more complicated, don't you think?
The idea would be to constrain what is exposed via the API, otherwise, folks that don't use watchfiles within their implementation will need to figure out how to implement aspects of watchfiles that don't exist in other offerings because our users will have grown accustomed to and, worse, expect the watchfiles functionality if its exposed in the raw. This is why I asked what we minimally need and only expose that as the "API". We can use whatever functionality we want internally.
I get your point, on the other hand it would be nice to offer various level of functionalities depending on the file system. On a local file system, I would expect to be able to use all of watchfiles' functionalities, and on a less capable file system, less functionalities. Maybe we're only talking about being able to watch file changes vs not being able to (and actually I need that in JupyterLab), maybe it's more subtle, like supporting the debounce parameter of watch() (which affects how changes are grouped).
That raises the question of letting the user discover what functionalities are supported. If no functionality at all, calling watch() currently raises an exception, so that's one way. For more subtle functionalities, I was thinking that the user could inspect the signature of watch(), and see e.g. if debounce is present, but I agree that it's not ideal. Maybe we could have a method that would return the list of supported arguments?
Or maybe, as you suggested, we should support a fixed set of functionalities, that would be the minimum that could be supported by all file systems. Thanks for you feedback Kevin!
We spent some time triaging PRs in the Server Meeting today. What's the current status of this PR?
It seems to me that it might be better to offer this as an alternative implementation of the FileContentsManager that could be provided as a separate package/extension. Imposing a "file-watching" cost on all users can be quite expensive, so making it an opt-in feature might be better.
TO;DR (too old; don't remember :smile:)