jupyter_server icon indicating copy to clipboard operation
jupyter_server copied to clipboard

Add file system change notifier

Open davidbrochart opened this issue 3 years ago • 19 comments

I think we should adhere to watchfiles's API.

Closes #778.

davidbrochart avatar Apr 08 '22 08:04 davidbrochart

I verified that watchfiles is available on conda-forge and has an MIT license.

blink1073 avatar Apr 08 '22 08:04 blink1073

Codecov Report

Merging #783 (029de4f) into main (02e5f3a) will decrease coverage by 0.37%. The diff coverage is 87.50%.

:exclamation: Current head 029de4f differs from pull request most recent head 7bd9f99. Consider uploading reports for the commit 7bd9f99 to get more accurate results

@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
- Coverage   70.29%   69.91%   -0.38%     
==========================================
  Files          62       62              
  Lines        7554     7383     -171     
  Branches     1248     1225      -23     
==========================================
- Hits         5310     5162     -148     
+ Misses       1861     1848      -13     
+ Partials      383      373      -10     
Impacted Files Coverage Δ
jupyter_server/services/contents/manager.py 82.57% <75.00%> (-0.15%) :arrow_down:
jupyter_server/services/contents/filemanager.py 73.17% <100.00%> (+0.38%) :arrow_up:
jupyter_server/terminal/api_handlers.py 0.00% <0.00%> (-100.00%) :arrow_down:
jupyter_server/terminal/terminalmanager.py 0.00% <0.00%> (-88.47%) :arrow_down:
jupyter_server/terminal/__init__.py 0.00% <0.00%> (-79.17%) :arrow_down:
jupyter_server/terminal/handlers.py 0.00% <0.00%> (-75.00%) :arrow_down:
jupyter_server/services/kernels/handlers.py 57.44% <0.00%> (-1.07%) :arrow_down:
jupyter_server/utils.py 61.38% <0.00%> (-1.00%) :arrow_down:
jupyter_server/extension/manager.py 92.75% <0.00%> (-0.85%) :arrow_down:
jupyter_server/pytest_plugin.py 83.80% <0.00%> (-0.37%) :arrow_down:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 02e5f3a...7bd9f99. Read the comment docs.

codecov-commenter avatar Apr 08 '22 08:04 codecov-commenter

I verified that watchfiles is available on conda-forge

I packaged it yesterday :smile:

davidbrochart avatar Apr 08 '22 08:04 davidbrochart

I was about to ask how it compares with watchgod - but actually that is the new name :smile:

fcollonval avatar Apr 08 '22 12:04 fcollonval

Indeed, but I think now almost everything is done in Rust using this library.

davidbrochart avatar Apr 08 '22 13:04 davidbrochart

I was about to ask how it compares with watchgod - but actually that is the new name

We use watchdog in EG and Elyra for cache consistency implementations. I'm guessing watchgod (watchfiles) came afterward (based on the play on the name), and I'm curious if there's something that should be avoided about watchdog? Since both EG and Elyra depend on jupyter_server, I'd rather not require our users to install two packages that essentially do the same thing. I'm fine converting the watchdog instances, but would like to better understand why watchfiles is advantageous.

I verified that watchfiles is available on conda-forge and has an MIT license.

Hmm. I see watchdog uses an Apache license. Is this difference a licensing thing? If so, are there also functionality differences that warrant the use of watchfiles over watchdog?

kevin-bates avatar Apr 08 '22 14:04 kevin-bates

Apache is fine, I was mainly checking for not GPL

blink1073 avatar Apr 08 '22 14:04 blink1073

@kevin-bates I had a previous experience with watchdog, my conclusion was that it is much better to work with watchgod (and thus watchfiles) in an async context. Also, watchfiles has some nice features such as grouping events, I'm not sure if watchdog has the equivalent.

davidbrochart avatar Apr 08 '22 14:04 davidbrochart

The question here is: if we are importing watchfiles (in the Python meaning), does it mean we are also importing its API as a specification that other contents managers should be compliant with?

davidbrochart avatar Apr 08 '22 15:04 davidbrochart

In JupyterLab I want to check if the contents manager supports watching file changes. For that I need to know if the underlying file system is the local file system:

if isinstance(self.contents_manager, (FileManagerMixin, AsyncFileManagerMixin)):

But this is not ideal since custom contents managers could target the local file system and not derive from FileManagerMixin or AsyncFileManagerMixin. I think the ContentsManager class should have a watchfiles property returning None by default, that implementations should override if they can offer this service. For instance, our file managers would have:

@property
def watchfiles(self):
    import watchfiles
    return watchfiles.__all__
    # ('watch', 'awatch', 'run_process', 'arun_process', 'Change', 'BaseFilter', 'DefaultFilter', 'PythonFilter', 'VERSION')

Then, the user could inspect the objects returned by this property to know what is supported. This makes the file system change notifyer a "loose specification" in the contents manager, meaning that it tries to follow watchfiles' API when possible. An alternative solution would be to fully specify the API in the contents manager, maybe by copying a subset of watchfiles' API, but it seems less flexible. What do you think?

davidbrochart avatar Apr 15 '22 09:04 davidbrochart

In this last commit I added the watchfiles attribute in the contents manager class. In ContentsManager it is implemented as an instance of an empty class, and in FileContentsManager and AsyncFileContentsManager it is the watchfiles module.

davidbrochart avatar Apr 15 '22 10:04 davidbrochart

In this last commit I added the watchfiles attribute in the contents manager class

I am wondering what does that mean in terms of jupyter-server semantic versioning. I guess it means jupyter-server will have to follow watchfiles if it adopts its API? I suppose we don't want to be in a situation where watchfile don't follow semver and breaks on a patch release?

If the watchfiles API is not too complicated, could the ContentManager just implement the methods it provides and privately make use of watchfiles?

martinRenou avatar Apr 15 '22 13:04 martinRenou

Yes that's what I meant with "loose specification". On the other hand, even if we implement the methods in ContentsManager and under the hood delegate to watchfiles, we still have no control over changes e.g. in the return values, etc. Or we would have to exactly pin the package, and manually sync with new releases. But maybe it is safer?

davidbrochart avatar Apr 15 '22 13:04 davidbrochart

Thanks for working on this @davidbrochart. I agree with @martinRenou and believe we should introduce some form of abstraction that allows folks the ability to implement their own "watchfiles" integration (in their own ContentsManager). We should also not preclude custom contents managers from wanting to watch remote files either - so filesystem locality should not be relevant - and boils down to does the configured ContentsManager support file-change notification or not?

kevin-bates avatar Apr 15 '22 14:04 kevin-bates

@kevin-bates @martinRenou @trungleduc I added a BaseWatchfiles class, that is currently populated with the following attributes (links to the corresponding watchfiles API):

I believe this is the minimum necessary for now (at least for my use-case :smile: ). We can add more in the future if needed. I also pinned watchfiles==0.13, so that this is a stable API in our contents manager implementations.

davidbrochart avatar Apr 15 '22 14:04 davidbrochart

Isn't this just exposing the watchfiles API? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implement watchfiles.

What do we need in addition to the file and its corresponding action: added, modified, or deleted?

Are we going to allow subsets of src_root to be monitored or is it all or nothing under src_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified - *.ipynb, *.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:

c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]

or, if only src_root, just let the configurable be the filter...

c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]

and non-None values indicate that "watch files" is enabled.

kevin-bates avatar Apr 15 '22 17:04 kevin-bates

Isn't this just exposing the watchfiles API? I think we need to ask what we need as a notification item and expose that. This is requiring that all implementations implement watchfiles.

I think we should adhere to watchfiles' API, but that doesn't mean that all implementations must implement it. Our file contents managers implement all of it, and directly delegates to watchfiles. But other implementations can implement a subset of it. But what they do implement, should conform to the API.

What do we need in addition to the file and its corresponding action: added, modified, or deleted?

Are we going to allow subsets of src_root to be monitored or is it all or nothing under src_root? Perhaps we could have a configurable list that specifies the directories to be monitored? And can the kinds of files be specified - *.ipynb, *.py, etc. relative to each monitored directory? If so, we could include the filtering expression in the directory:

c.ContentsManager.watched_directories = ["foo/bar/{*.ipynb, *.py}", "baz/{*.yaml}"]

or, if only src_root, just let the configurable be the filter...

c.ContentsManager.watched_files = ["*.ipynb", "*.py", "*.yaml"]

and non-None values indicate that "watch files" is enabled.

I think all your suggestions can be implemented with the watch_filter argument of watch(). That is why I think we should adhere to watchfiles' API, there is no need to reinvent the wheel. It is a very well designed library and I doubt we would come up with a better API. And anyway, if we want to use it under the hood at least for our file contents managers, having another layer on top of its original API will probably make things more complicated, don't you think?

davidbrochart avatar Apr 15 '22 21:04 davidbrochart

The idea would be to constrain what is exposed via the API, otherwise, folks that don't use watchfiles within their implementation will need to figure out how to implement aspects of watchfiles that don't exist in other offerings because our users will have grown accustomed to and, worse, expect the watchfiles functionality if its exposed in the raw. This is why I asked what we minimally need and only expose that as the "API". We can use whatever functionality we want internally.

kevin-bates avatar Apr 15 '22 22:04 kevin-bates

I get your point, on the other hand it would be nice to offer various level of functionalities depending on the file system. On a local file system, I would expect to be able to use all of watchfiles' functionalities, and on a less capable file system, less functionalities. Maybe we're only talking about being able to watch file changes vs not being able to (and actually I need that in JupyterLab), maybe it's more subtle, like supporting the debounce parameter of watch() (which affects how changes are grouped). That raises the question of letting the user discover what functionalities are supported. If no functionality at all, calling watch() currently raises an exception, so that's one way. For more subtle functionalities, I was thinking that the user could inspect the signature of watch(), and see e.g. if debounce is present, but I agree that it's not ideal. Maybe we could have a method that would return the list of supported arguments? Or maybe, as you suggested, we should support a fixed set of functionalities, that would be the minimum that could be supported by all file systems. Thanks for you feedback Kevin!

davidbrochart avatar Apr 16 '22 09:04 davidbrochart

We spent some time triaging PRs in the Server Meeting today. What's the current status of this PR?

It seems to me that it might be better to offer this as an alternative implementation of the FileContentsManager that could be provided as a separate package/extension. Imposing a "file-watching" cost on all users can be quite expensive, so making it an opt-in feature might be better.

Zsailer avatar Feb 16 '23 16:02 Zsailer

TO;DR (too old; don't remember :smile:)

davidbrochart avatar Feb 16 '23 16:02 davidbrochart