jupyter_core icon indicating copy to clipboard operation
jupyter_core copied to clipboard

data/config path entry_points with minimal examples

Open bollwyvl opened this issue 4 years ago • 89 comments

Background

Jupyter relies on a hierarchy of directories (user-level, environment-level, system-level, etc.) to store configuration and data. These directories are used by a number of Jupyter programs, for example:

  • Most applications based on the traitlets Configurable application class store configuration in JSON files in the configuration directories. They also aggregate conf.d-style configuration from these directories to determine settings of options.
  • Jupyter Notebook extensions copy their javascript assets into a data directory on installation for the server to serve
  • JupyterLab extensions copy their javascript assets into a data directory on installation for the server to serve.

Problem

Currently the environment level of this directory hierarchy is a fixed location based on sys.prefix. This means that packages need to copy their files into this directory at install time, which has several issues:

  • Copying files into a data directory uses the data_files feature of Python packages, which is deprecated in setuptools and is not supported in non-setuptools-based packagers like flit, poetry (see here), etc.
  • Data files are duplicated in the package bundle (once for copying into the data directory, once for being included in the actual package to install into site-packages). For some extensions, this a huge (like megabytes or tens of megabytes).
  • Development installs (pip -e) do not update data files when the source files change, so when developing a package, if something changes to the data files, you either have to copy them over again, or you have to run a command to make the appropriate data directory a symbolic link (not available on some platforms) to the source files.

(Also, it seems that sometimes these data file directories are not deleted. For example, in JupyterLab we actually create files at runtime in the data directory, and I think they don't get deleted when JupyterLab is uninstalled)

Proposed solution

Python has another mechanism that is explicitly designed for plugin systems called entry points. An entry point is a piece of metadata in a package that points to an arbitrary import from the package. This PR changes jupyter_core to look for two specific entry points in any installed package, each pointing to a list of paths, to augment the environment-level Jupyter config directories (the jupyter_config_paths entry point) and data directories (the jupyter_data_paths entry point). The result is:

  • Any package can add new environment-level Jupyter config and data directories. In practice, this means that a package can contain data or configuration in a directory that is installed in its site-packages directory, and can use the entry point to point Jupyter to that internal directory. Since this directory is internal to the package:
    • the files are not duplicated in the package tarball
    • development (un)installs automatically work, since the directory points to an internal directory in the package
    • other python package managers can be used, like poetry using its include/exclude mechanism for files
  • non-Python programs can access this (and all other paths) by shelling out to jupyter --paths --json

Problems with the proposed solution

  • Entry points are based on importing a module to get a value, which potentially could be very expensive. We explore parsing the file first for literal values, and then importing as a last resort, which seems to alleviate this problem in the common case (setuptools does something similar for its attr handler for setup.cfg values).
  • neither entry_point group is cached
    • an interactive installation with e.g. pip install or conda install would be able to update the search path, provided the application isn't doing its own caching...
      • this is important to maintain the observed behavior of data_files
      • because the import system is invoked, users of this system may wish to create a separate python_packages entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g. pandas
    • [ ] adding some debug logging around this will help pinpoint slow startup times
      • turns out there is no logging this deep in the stack. we could either:
        • [ ] add a log=None argument to the various calls
        • [ ] add a logger controlled by a JUPYTER_CORE_LOGLEVEL
  • if an entry_point is added or (its target is changed) in a package with an editable install, it must be reinstalled
    • however, if only the return value of an existing entry_point is changed, no re-install is required
  • existing tools that were relying on indexing jupyter_*paths()
    • this occurs in the test suite for jupyter_core itself: if one of the example packages is installed, the tests break
    • [ ] these will have to be updated to inspect relative positions, e.g. was the user dir loaded before or after the env paths when JUPYTER_PREFER_ENV is set

Alternative solutions

setuptools also provides a way for a package to have custom metadata files in the egg or dist_info directories. This avoids the problems of importing or parsing an arbitrary python file to get the few strings that we need. However, it appears that this arbitrary metadata is not well supported outside of setuptools. See below for some experiments around this approach.

Example

See the setuptools example, specifically https://github.com/jupyter/jupyter_core/blob/38e3acd220153871ddd93d3e77a8b0af9e18c9db/examples/jupyter_path_entrypoint_setuptools/setup.cfg#L35-L39

  • this approach requires a boilerplate MANIFEST.in and a setup.py in order to be installed from source

and the flit example, specifically https://github.com/jupyter/jupyter_core/blob/38e3acd220153871ddd93d3e77a8b0af9e18c9db/examples/jupyter_path_entrypoint_flit/pyproject.toml#L11-L15 for examples of how to use these entry points.

  • pyproject.toml is the only boilerplate file needed, and generates a setup.py
  • flit can also generate binary reproducible whl files (for python >=3.7) given the same version of flit_core

Original issue description

Hey folks! Thanks for keeping this foundational technology working.

data_files are making me sad enough that I'm willing to bring this up again.

This is a low-downstream-impact way we could allow python packages to not require the ill-supported data_files technique.

To test:

pip install -e .
cd examples/entry_point_example
pip install -e .
jupyter --paths
# should see that development environment in place
pip uninstall entry_point_example
jupyter --paths
# it's gone

I don't know if it really works yet, down the the n-th downstream, but seems it should if they are relying on jupyter_*_dir, and handling multiple paths already.

bollwyvl avatar Nov 21 '20 21:11 bollwyvl

I see this as a good alternative to using data_files without overhauling the config system. I am a bit worried that it's hard to debug when things go wrong (if 15 directories will be scanned). Could we maybe provide a richer debug facility to see a particular config key, and how each directory is changing it. Grepping in 15 directories will not be fun. Or do I see a problem that does not exist, and are the debug options sufficient?

maartenbreddels avatar Nov 21 '20 21:11 maartenbreddels

Grepping in 15 directories will not be fun

Yep, there will be a lot of directories beyond the Big Four. No doubt some combination of jupyter --paths, jq, and xargs would make grep plausible, but that's no fun!

A JupyterApp base flag like --show-config which every app would inherit is a whacking good idea, even outside of this little draft. It could probably use difflib to generate a decently-readable representation of the config before each file was loaded, and show the final config, perhaps something like:

$> jupyter foo --show-config

environment variables:
- JUPYTER_PREFER_ENV_PATH: not set
- ...

paths:
- /etc/jupyter/jupyter_config.json: not found
...
- ~/my-project/src/my_project/etc/jupyter_foo_config.d/my-project.json:

    + SomeHasTraits:
    +   foo: bar
...
- ~/my-project/src/my_project/.venv/etc/jupyter_config.d/someone-elses-project.json:

      SomeHasTraits:
    -   foo: bar
    +   foo: baz

...
- ./jupyter_foo_config.json: not found

final:

    SomeHasTraits:
      foo: baz

sprinkle in some pygments (if available) and it would be pretty usable.

bollwyvl avatar Nov 21 '20 22:11 bollwyvl

Indeed, exactly what I had in mind, that would help a lot

maartenbreddels avatar Nov 22 '20 06:11 maartenbreddels

Gah, looking at it: a lot of the complexity is duplicated between jupyter_server and notebook... while both would work with this PR, there's no simple way to add the above config inspection.

Perhaps the better short-term approach would be to invert it, with a separate package/command, e.g. offered jupyter show-config notebook FooHasTraits.bar. I guess this would work by overloading/monkeypatching config_manager_class (gaaah) with an instrumented subclass, and call initialize but not start.

Because of that complexity, this could probably not land here, unless the ConfigManager pattern was brought upstream, which sounds hard to coordinate.

bollwyvl avatar Nov 22 '20 14:11 bollwyvl

I have an unshaeably bad version of this, but it kinda works with notebook, jupyter_server, jupyterlab and voila installed:

getting jupyter_server_config from /etc/jupyter got {} getting jupyter_server_config from /usr/local/etc/jupyter got {} getting jupyter_server_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/jupyterlab.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/nbclassic.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/voila.json got {'ServerApp': {'jpserver_extensions': {'jupyterlab': True, 'nbclassic': True, 'voila.server_extension': True}}} getting jupyter_server_config from /home/weg/.jupyter got {} getting page_config from /etc/jupyter/labconfig got {} getting page_config from /usr/local/etc/jupyter/labconfig got {} getting page_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/labconfig got {} getting page_config from /home/weg/.jupyter/labconfig got {} [I 2020-11-22 17:50:37.177 ServerApp] jupyterlab | extension was successfully linked. getting jupyter_notebook_config from /home/weg/.jupyter got {} getting jupyter_notebook_config from /etc/jupyter got {} getting jupyter_notebook_config from /usr/local/etc/jupyter got {} getting jupyter_notebook_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/jupyterlab.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/voila.json got {'NotebookApp': {'nbserver_extensions': {'jupyterlab': True, 'voila.server_extension': True}}} getting jupyter_notebook_config from /home/weg/.jupyter got {} [I 2020-11-22 17:50:37.322 ServerApp] nbclassic | extension was successfully linked. [I 2020-11-22 17:50:37.322 ServerApp] voila.server_extension | extension was successfully linked. [I 2020-11-22 17:50:37.339 LabApp] JupyterLab extension loaded from /home/weg/projects/jupyter_showconfig_/envs/default/lib/python3.7/site-packages/jupyterlab [I 2020-11-22 17:50:37.339 LabApp] JupyterLab application directory is /home/weg/projects/jupyter_showconfig_/envs/default/share/jupyter/lab [I 2020-11-22 17:50:37.342 ServerApp] jupyterlab | extension was successfully loaded. [I 2020-11-22 17:50:37.345 ServerApp] nbclassic | extension was successfully loaded. [I 2020-11-22 17:50:37.347 ServerApp] voila.server_extension | extension was successfully loaded.

Update: here's some better stuff, generated with rich:

 op     ┃ section_name                                      ┃ path                              ┃ old_value ┃ new_value                                                  
━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 stage  │                                                   │                                   │           │ before-init                                                
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ io.open                           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ BaseJSONConfigManager.get         │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ before-constructor                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-constructor                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ nbclassic.json                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "ServerApp": {                                           
        │                                                   │                                   │           │     "jpserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "nbclassic": true,                                   
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /etc/jupyter/labconfig            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /etc/jupyter/labconfig            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "NotebookApp": {                                         
        │                                                   │                                   │           │     "nbserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ kernel_spec_manager                               │ ServerApp                         │           │                                             
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ ssl_options                                       │ ServerApp                         │ {}        │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-init                                                 
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ started                                                    
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ done              

bollwyvl avatar Nov 22 '20 22:11 bollwyvl

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-do-we-uninstall-extensions-that-have-been-installed-using-jupyter-labextension-develop-overwrite/7845/5

meeseeksmachine avatar Feb 09 '21 01:02 meeseeksmachine

@bollwyvl - I made a PR to your PR with a few changes I thought would be good: https://github.com/bollwyvl/jupyter_core/pull/1. What do you think?

jasongrout avatar Mar 04 '21 06:03 jasongrout

@jasongrout thanks for that! I merged in upstream, added a flit example and some tests.

bollwyvl avatar Mar 04 '21 17:03 bollwyvl

Overall, I think this is a good idea. My immediate usecase is using this for installing prebuilt extensions, and originally I was thinking that we just enable an entry point for jlab plugins specifically. I like that this solution is far more general and more foundational than a jlab-specific entry point!

jasongrout avatar Mar 04 '21 19:03 jasongrout

Yeah, federated extensions are a big motivator: as mentioned, the data_files approach yields duplicates in wheels, which really starts adding up on large builds (wasm, design templates, etc).

I really like how dense the flit example is... there's sane globbing/excluding, and the reproducible wheel building is a big step forward. I would move all of my labextensions to that toolchain as soon as this was available.

bollwyvl avatar Mar 04 '21 20:03 bollwyvl

@bollwyvl - do you mind if I edit the original description to provide an overview of this PR and its impact on how things work in Jupyter?

jasongrout avatar Mar 04 '21 21:03 jasongrout

Free copy editing!? This place has everything!

bollwyvl avatar Mar 04 '21 22:03 bollwyvl

Some totally unscientific timings with the two tiny example packages:

From the CLI:

Init jupyter_config_paths in 3.5849ms
Init jupyter_config_paths:entry-point-example-setuptools in 0.1771ms
Init jupyter_config_paths:entry-point-example-flit in 0.1349ms
Init jupyter_data_paths in 3.1528ms
Init jupyter_data_paths:entry-point-example-setuptools in 0.0212ms
Init jupyter_data_paths:entry-point-example-flit in 0.0160ms

From an interactive python session (not ipython):

Init jupyter_data_paths in 8.7347ms
Init jupyter_data_paths:entry-point-example-setuptools in 0.2825ms
Init jupyter_data_paths:entry-point-example-flit in 0.1893ms
Init jupyter_config_paths in 5.3611ms
Init jupyter_config_paths:entry-point-example-setuptools in 0.0370ms
Init jupyter_config_paths:entry-point-example-flit in 0.0262ms

I'd wager the baseline init of finding all the entry_points is static, and will increase linearly with the number of entry_points loaded. Subsequent calls take the same amount of time.

If, as is likely, the two entry point targets are both loaded from __init__.py, the load import cost will only be paid the first time, as everything will be cached in the import machinery.

However, if each GET request to a Jupyter server requires warming these up, this will add up rapidly. So we'd need to decide where it would be appropriate to control caching, such as (not mutually exclusive):

  • a sensible, environment-variable override-able default cache period (e.g. a minute)
  • offer a force=False flag for cache invalidation
    • e.g. an app's index.html-equivalent handler could invalidate the cache when it starts a template render, but subsequent requests for images/js/css would not

bollwyvl avatar Mar 05 '21 01:03 bollwyvl

However, if each GET request to a Jupyter server requires warming these up, this will add up rapidly. So we'd need to decide where it would be appropriate to control caching, such as (not mutually exclusive):

For loading JLab extensions (IIRC), we cache the information once at application startup.

jasongrout avatar Mar 05 '21 09:03 jasongrout

Welp, if you're on binder, and "pip install" a widget library, you'd refresh the page... And not get your widgets. But your kernel-side stuff would work.

I ran into a similar issue in the licenses pr (also the federated extension info appears to be in a closure).

bollwyvl avatar Mar 05 '21 12:03 bollwyvl

Here's another crazy idea that works around the problem of entry points always importing a package's __init__.py, which may be expensive:

Using https://setuptools.readthedocs.io/en/latest/userguide/extension.html#adding-new-egg-info-files as a basis, I use custom package metadata. My changes are in this commit: https://github.com/jasongrout/jupyter_core/commit/66351b0978f21b275dd58ab13e6448ef3c610705. Since I'm poking around in setuptools, which I don't know much about, I'm sure the code could be cleaned up or made more general by someone more familiar with setuptools.

The tradeoff is that the paths are more declarative, and I assume that paths are relative to the package root, rather than being able to compute them on the fly like with entry points.

Thoughts about using custom package metadata vs entry points that require importing the package?

Patch copied here for completeness and archiving
Use custom package metadata for augmenting Jupyter paths.

This does not require packages to be imported in order to get the Jupyter paths, which are potentially costly steps. Instead, we rely strictly on scanning and getting package metadata.
---
 .../setup.cfg                                 |  8 +++++
 .../__init__.py                               |  6 ++--
 jupyter_core/paths.py                         | 31 +++++++++++++++----
 jupyter_core/utils/__init__.py                | 27 +++++++++++++++-
 setup.cfg                                     |  6 ++++
 5 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/examples/jupyter_path_entrypoint_setuptools/setup.cfg b/examples/jupyter_path_entrypoint_setuptools/setup.cfg
index 71239d8..4dbecec 100644
--- a/examples/jupyter_path_entrypoint_setuptools/setup.cfg
+++ b/examples/jupyter_path_entrypoint_setuptools/setup.cfg
@@ -25,9 +25,17 @@ include_package_data = True
 zip_safe = False
 python_requires = >=3.6
 
+setup_requires =
+    jupyter_core
 install_requires =
     jupyter_core
 
+# Jupyter directories are relative to the package root
+jupyter_config_paths =
+    etc/jupyter
+    etc/another/jupyter
+jupyter_data_paths = share/jupyter
+
 [options.packages.find]
 where =
     src
diff --git a/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py b/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
index 64013c8..dc836c5 100644
--- a/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
+++ b/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
@@ -3,7 +3,7 @@
 
 __version__ = "0.1.0"
 
-HERE = os.path.abspath(os.path.dirname(__file__))
+# HERE = os.path.abspath(os.path.dirname(__file__))
 
-JUPYTER_CONFIG_PATHS = [os.path.join(HERE, "etc", "jupyter")]
-JUPYTER_DATA_PATHS = [os.path.join(HERE, "share", "jupyter")]
+# JUPYTER_CONFIG_PATHS = [os.path.join(HERE, "etc", "jupyter")]
+# JUPYTER_DATA_PATHS = [os.path.join(HERE, "share", "jupyter")]
diff --git a/jupyter_core/paths.py b/jupyter_core/paths.py
index 4ec3668..7d8b311 100644
--- a/jupyter_core/paths.py
+++ b/jupyter_core/paths.py
@@ -19,6 +19,7 @@
 from contextlib import contextmanager
 
 import entrypoints
+import pkg_resources
 
 
 pjoin = os.path.join
@@ -49,6 +50,18 @@ def _entry_point_paths(ep_group):
             ))
     return paths
 
+def _package_metadata(group):
+    """Load extra jupyter paths from custom package metadata
+    """
+    paths = []
+    filename = f'{group}.txt'
+    for distribution in sorted(pkg_resources.working_set, key=lambda x: x.key):
+        if distribution.has_metadata(filename) and distribution.has_metadata('top_level.txt'):
+            top_level = list(distribution.get_metadata_lines('top_level.txt'))[0]
+            localpaths = [f'{top_level}/{p}' for p in distribution.get_metadata_lines(filename)]
+            paths.extend(distribution.get_resource_filename(distribution, p) for p in localpaths if distribution.resource_isdir(p))
+    return paths
+
 def envset(name):
     """Return True if the given environment variable is set
 
@@ -187,16 +200,19 @@ def jupyter_path(*subdirs):
     # Next is environment or user, depending on the JUPYTER_PREFER_ENV_PATH flag
     user = jupyter_data_dir()
     env = [p for p in ENV_JUPYTER_PATH if p not in SYSTEM_JUPYTER_PATH]
-    entry_points = [p for p in _entry_point_paths(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
+    # entry_points = [p for p in _entry_point_paths(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
+    package_metadata = [p for p in _package_metadata(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
 
     if envset('JUPYTER_PREFER_ENV_PATH'):
         paths.extend(env)
-        paths.extend(entry_points)
+        # paths.extend(entry_points)
+        paths.extend(package_metadata)
         paths.append(user)
     else:
         paths.append(user)
         paths.extend(env)
-        paths.extend(entry_points)
+        # paths.extend(entry_points)
+        paths.extend(package_metadata)
 
     # finally, system
     paths.extend(SYSTEM_JUPYTER_PATH)
@@ -244,16 +260,19 @@ def jupyter_config_path():
     # Next is environment or user, depending on the JUPYTER_PREFER_ENV_PATH flag
     user = jupyter_config_dir()
     env = [p for p in ENV_CONFIG_PATH if p not in SYSTEM_CONFIG_PATH]
-    entry_points = [p for p in _entry_point_paths(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
+    # entry_points = [p for p in _entry_point_paths(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
+    package_metadata = [p for p in _package_metadata(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
 
     if envset('JUPYTER_PREFER_ENV_PATH'):
         paths.extend(env)
-        paths.extend(entry_points)
+        # paths.extend(entry_points)
+        paths.extend(package_metadata)
         paths.append(user)
     else:
         paths.append(user)
         paths.extend(env)
-        paths.extend(entry_points)
+        # paths.extend(entry_points)
+        paths.extend(package_metadata)
 
     # Finally, system path
     paths.extend(SYSTEM_CONFIG_PATH)
diff --git a/jupyter_core/utils/__init__.py b/jupyter_core/utils/__init__.py
index 6ef6d5c..43466e7 100644
--- a/jupyter_core/utils/__init__.py
+++ b/jupyter_core/utils/__init__.py
@@ -13,4 +13,29 @@ def ensure_dir_exists(path, mode=0o777):
         if e.errno != errno.EEXIST:
             raise
     if not os.path.isdir(path):
-        raise IOError("%r exists but is not a directory" % path)
\ No newline at end of file
+        raise IOError("%r exists but is not a directory" % path)
+
+# from setuptools.config.ConfigHandler
+def _parse_list(value, separator=','):
+    """Represents value as a list.
+    Value is split either by separator (defaults to comma) or by lines.
+    :param value:
+    :param separator: List items separator character.
+    :rtype: list
+    """
+    if isinstance(value, list):  # _get_parser_compound case
+        return value
+
+    if '\n' in value:
+        value = value.splitlines()
+    else:
+        value = value.split(separator)
+
+    return [chunk.strip() for chunk in value if chunk.strip()]
+
+def write_arg_list(cmd, basename, filename):
+    argname = os.path.splitext(basename)[0]
+    value = getattr(cmd.distribution, argname, None)
+    if value is not None:
+        value = "\n".join(_parse_list(value)) + "\n"
+    cmd.write_or_delete_file(argname, filename, value)
diff --git a/setup.cfg b/setup.cfg
index a065370..729c719 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -38,3 +38,9 @@ console_scripts =
     jupyter              = jupyter_core.command:main
     jupyter-migrate      = jupyter_core.migrate:main
     jupyter-troubleshoot = jupyter_core.troubleshoot:main
+distutils.setup_keywords =
+    jupyter_config_paths = setuptools.dist:assert_string_list
+    jupyter_data_paths = setuptools.dist:assert_string_list
+egg_info.writers =
+    jupyter_config_paths.txt = jupyter_core.utils:write_arg_list
+    jupyter_data_paths.txt = jupyter_core.utils:write_arg_list

jasongrout avatar Mar 05 '21 12:03 jasongrout

custom package metadata.

My issue is: I think if we're going to do something special for python packages (as opposed to julia, r, etc. to which this information will be pretty much opaque), my preference would be to stay as close to the spec (PEP 517, 639, ...) as possible, in a format that as many tools as possible support. Introducing new metadata fields, that can only be feasibly created with jupyter_packaging, will disincentivize maintainers creating packages that "casually" work with Jupyter tools.

I think with

  • sound documentation (that could be under jupyter_packaging, i suppose)
  • sensibly defaulted caching
  • providing profiling information through some means
  • guidance on making the packages that own these dedicated modules (as in, shows up in top_level.txt), with (almost no imports)

...entry_points is a fine choice, and are already used/documented in nbconvert and elsewhere.

And, back to the lab extensions point: I might just be thinking about some ability to dynamically generate federated extensions without nodejs, much less jupyter_packaging, and making this entry_point target a singleton with __slice__ method...

bollwyvl avatar Mar 05 '21 13:03 bollwyvl

custom package metadata.

But thank you @jasongrout for providing additional options. Please feel free to hoist that idea up to the description! I took [strawman] off, but the intent of the original issue (https://github.com/jupyter-server/jupyter_server/issues/351) still stands... I'm always sad when a snap decision gets made just because someone needs it/is getting paid for it today or whatever, rather than we think this is good for all Jupyter stakeholders, from the janitors to the hot-shot spaceship pilots.

bollwyvl avatar Mar 05 '21 13:03 bollwyvl

Are these two statements referring to the same concept?

because the import system is invoked, users of this system may wish to create a separate python_packages entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g. pandas

guidance on making the packages that own these dedicated modules (as in, shows up in top_level.txt), with (almost no imports)

If so, can you elaborate? Is there a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?

jasongrout avatar Mar 05 '21 13:03 jasongrout

Introducing new metadata fields, that can only be feasibly created with jupyter_packaging, will disincentivize maintainers creating packages that "casually" work with Jupyter tools.

Note that the custom metadata here only relies on jupyter_core as a dependency, not jupyter_packaging. The example was updated to show that all a user would have to do is add one or two arguments to their setup.py file (or equivalent entries in setup.cfg), and have a setup_requires point to jupyter_core.

jasongrout avatar Mar 05 '21 13:03 jasongrout

If so, can you elaborate? Is there a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?

For completeness: I also explored a little bit using namespace packages to try to avoid importing the main package to get to a lightweight entry point, but gave it up as being too complicated/magical to recommend to everyone.

jasongrout avatar Mar 05 '21 14:03 jasongrout

my preference would be to stay as close to the spec (PEP 517, 639, ...) as possible, in a format that as many tools as possible support.

If other packaging tools do not support custom metadata, then that would be a showstopper for using custom metadata, I think.

I agree the entry points would be better theoretically, it just bothers me that you have to import the package to get at this static configuration data.

jasongrout avatar Mar 05 '21 14:03 jasongrout

a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?

to get slightly more precise on python terminology, a module can't do it, but a distribution can. consider:

src/
  my_widget/
    __init__.py     # imports pandas, ipywidgets, the kitchen and bathroom sink
    share/
  my_widget_config/
    __init__.py     # imports nothing but pathlib, points at paths of ../my_widget/share

then find_packages(include="src") would turn up my_widget and my_widget_config. People (inadvertently) do this all the time with tests, which means all of them are broken.

Of course, with flit, this wouldn't work, as it only does one package at a time. poetry and setuptools could do it declaratively, though. but really... i'm coming more around to thinking that, at least in the case of big-ol'-lab-assets, they should be separate packages anyway, so that it's possible to support multiple versions of lab.

I'll add that to the setuptools example.

namespace packages

Ooh, yeah, magic names are.... not fun in the slightest. And namespace packages are vicious. I tolerate it on wxyz, but woe betide anyone else that wants to collaborate on that without coordinating.

bollwyvl avatar Mar 05 '21 14:03 bollwyvl

but a distribution can.... then find_packages(include="src") would turn up my_widget and my_widget_config

In this case, would we have:

  • A single tarball/wheel distributed via pypi, or multiple tarball/wheel files?
  • two top-level imports in python, i.e., import my_widget and import my_widget_config both would work?
  • two top-level directories in site-packages, or one?

jasongrout avatar Mar 05 '21 17:03 jasongrout

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-paths-priority-order/7771/3

meeseeksmachine avatar Mar 05 '21 17:03 meeseeksmachine

in the simplest case:

  • a single tar.gz, a single .whl
  • two top-level imports
  • two top-level directories in site-packages

again, when folk use find_packages today, this happens all the time with a site-packages/tests. Even jupyter_server did this for a while! But we can use the heck out of it. These things won't be zip_safe... but they never were.

ancedotally, on jupyterlab-lsp, we opted to split them, and jupyterlab-lsp has a dependency on jupyter-lsp, as it would work with any jupyter_server and not bring in jlab-specific deps like json5.

bollwyvl avatar Mar 05 '21 17:03 bollwyvl

this happens all the time with a site-packages/tests

Very interesting! Does pip not give any warning when a package installation overwrites an existing directory, i.e., two different packages stomp on each other?

jasongrout avatar Mar 05 '21 18:03 jasongrout

If other packaging systems support custom metadata, I think that's probably cleaner than having a separate top-level directory and top-level import just for an entry point, i.e., you only have one top-level import/directory. On the other hand, maybe that extra package is only done in situations where the original package import is expensive, and maybe that's not so common. On the other hand, maybe that second package contains the actual etc/ and share/ directories as well (e.g., that second package is what distributes the javascript assets for a lab extension) - perhaps it is a bit iffy to assume you can do ../mypackage/etc/jupyter to point to the first package from the second package?

jasongrout avatar Mar 05 '21 18:03 jasongrout

If other packaging systems support custom metadata

I haven't looked into it... i loathe making PRs to packaging systems to support stuff. I mean, the ground is littered with old Please make AMD module PRs to have stuff work with requirejs, plz publish language server on npm for LSP... but at least these made things better for all their downstreams. If custom metadata is not wide-spread (and/or doesn't has a formal PEP), i just dunno... it sure sounds like rewriting entry_points, But For Jupyter... which is my point in this whole exercise.

one top-level import/directory.

Welp... most folk don't want to tell people to import my_widget.widgets_foo. You can do magic __getattr__ and stuff to make them lazy, though. But if that widgets_foo.py imports pandas, then were back at the initial expensive load.

Does pip not give any warning

Nope, otherwise old-style namespace packages like backports.* wouldn't work, but conda, apt, etc. sure do, and have to patch around these things! pip also ignores extras pins after the fact, yadda, yadda... it's not great, but it's what we have to deal with. We do not what, A Package Manager, But For Jupyter (though I'll stand by jlpm being a good call :blush:).

second package contains the actual etc/ and share/ directories as well

Yeah, that pattern that could work... certainly for etc. None of the _jupyter_this_and_that_extension magic functions need any imports.

But then, many of those enable --py, etc. won't even be needed for something that uses these tools, just pip install -e (or flint install --symlink).

With federated extensions, folks are reading more JSON than before, which is good for DRY, and that isn't free, but usually that's a one-time hit.

bit iffy to assume you can do ../mypackage/etc/jupyter to point to the first package from the second package?

I mean, that's what you buy when you publish packages that end up in site-packages. But sure. Especially with --symlink, stuff gets weird.

bollwyvl avatar Mar 05 '21 18:03 bollwyvl

i loathe making PRs to packaging systems to support stuff.

Yep, agreed. If other packaging systems don't support the custom metadata, I think the option is out.

jasongrout avatar Mar 05 '21 19:03 jasongrout