jupyter_core
jupyter_core copied to clipboard
data/config path entry_points with minimal examples
Background
Jupyter relies on a hierarchy of directories (user-level, environment-level, system-level, etc.) to store configuration and data. These directories are used by a number of Jupyter programs, for example:
- Most applications based on the traitlets Configurable application class store configuration in JSON files in the configuration directories. They also aggregate conf.d-style configuration from these directories to determine settings of options.
- Jupyter Notebook extensions copy their javascript assets into a data directory on installation for the server to serve
- JupyterLab extensions copy their javascript assets into a data directory on installation for the server to serve.
Problem
Currently the environment level of this directory hierarchy is a fixed location based on sys.prefix
. This means that packages need to copy their files into this directory at install time, which has several issues:
- Copying files into a data directory uses the
data_files
feature of Python packages, which is deprecated in setuptools and is not supported in non-setuptools-based packagers likeflit
,poetry
(see here), etc. - Data files are duplicated in the package bundle (once for copying into the data directory, once for being included in the actual package to install into
site-packages
). For some extensions, this a huge (like megabytes or tens of megabytes). - Development installs (
pip -e
) do not update data files when the source files change, so when developing a package, if something changes to the data files, you either have to copy them over again, or you have to run a command to make the appropriate data directory a symbolic link (not available on some platforms) to the source files.
(Also, it seems that sometimes these data file directories are not deleted. For example, in JupyterLab we actually create files at runtime in the data directory, and I think they don't get deleted when JupyterLab is uninstalled)
Proposed solution
Python has another mechanism that is explicitly designed for plugin systems called entry points. An entry point is a piece of metadata in a package that points to an arbitrary import from the package. This PR changes jupyter_core
to look for two specific entry points in any installed package, each pointing to a list of paths, to augment the environment-level Jupyter config directories (the jupyter_config_paths
entry point) and data directories (the jupyter_data_paths
entry point). The result is:
- Any package can add new environment-level Jupyter config and data directories. In practice, this means that a package can contain data or configuration in a directory that is installed in its
site-packages
directory, and can use the entry point to point Jupyter to that internal directory. Since this directory is internal to the package:- the files are not duplicated in the package tarball
- development (un)installs automatically work, since the directory points to an internal directory in the package
- other python package managers can be used, like poetry using its include/exclude mechanism for files
- non-Python programs can access this (and all other paths) by shelling out to
jupyter --paths --json
Problems with the proposed solution
- Entry points are based on importing a module to get a value, which potentially could be very expensive. We explore parsing the file first for literal values, and then importing as a last resort, which seems to alleviate this problem in the common case (setuptools does something similar for its
attr
handler for setup.cfg values). - neither
entry_point
group is cached- an interactive installation with e.g.
pip install
orconda install
would be able to update the search path, provided the application isn't doing its own caching...- this is important to maintain the observed behavior of
data_files
- because the import system is invoked, users of this system may wish to create a separate
python_packages
entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g.pandas
- this is important to maintain the observed behavior of
- [ ] adding some debug logging around this will help pinpoint slow startup times
- turns out there is no logging this deep in the stack. we could either:
- [ ] add a
log=None
argument to the various calls - [ ] add a logger controlled by a
JUPYTER_CORE_LOGLEVEL
- [ ] add a
- turns out there is no logging this deep in the stack. we could either:
- an interactive installation with e.g.
- if an
entry_point
is added or (its target is changed) in a package with an editable install, it must be reinstalled- however, if only the return value of an existing
entry_point
is changed, no re-install is required
- however, if only the return value of an existing
- existing tools that were relying on indexing
jupyter_*paths()
- this occurs in the test suite for
jupyter_core
itself: if one of the example packages is installed, the tests break - [ ] these will have to be updated to inspect relative positions, e.g. was the user dir loaded before or after the env paths when
JUPYTER_PREFER_ENV
is set
- this occurs in the test suite for
Alternative solutions
setuptools also provides a way for a package to have custom metadata files in the egg or dist_info directories. This avoids the problems of importing or parsing an arbitrary python file to get the few strings that we need. However, it appears that this arbitrary metadata is not well supported outside of setuptools. See below for some experiments around this approach.
Example
See the setuptools example, specifically https://github.com/jupyter/jupyter_core/blob/38e3acd220153871ddd93d3e77a8b0af9e18c9db/examples/jupyter_path_entrypoint_setuptools/setup.cfg#L35-L39
- this approach requires a boilerplate
MANIFEST.in
and asetup.py
in order to be installed from source
and the flit example, specifically https://github.com/jupyter/jupyter_core/blob/38e3acd220153871ddd93d3e77a8b0af9e18c9db/examples/jupyter_path_entrypoint_flit/pyproject.toml#L11-L15 for examples of how to use these entry points.
-
pyproject.toml
is the only boilerplate file needed, and generates asetup.py
-
flit
can also generate binary reproduciblewhl
files (for python >=3.7) given the same version offlit_core
Original issue description
Hey folks! Thanks for keeping this foundational technology working.
data_files
are making me sad enough that I'm willing to bring this up again.
This is a low-downstream-impact way we could allow python packages to not require the ill-supported data_files
technique.
To test:
pip install -e .
cd examples/entry_point_example
pip install -e .
jupyter --paths
# should see that development environment in place
pip uninstall entry_point_example
jupyter --paths
# it's gone
I don't know if it really works yet, down the the n-th downstream, but seems it should if they are relying on jupyter_*_dir
, and handling multiple paths already.
I see this as a good alternative to using data_files without overhauling the config system. I am a bit worried that it's hard to debug when things go wrong (if 15 directories will be scanned). Could we maybe provide a richer debug facility to see a particular config key, and how each directory is changing it. Grepping in 15 directories will not be fun. Or do I see a problem that does not exist, and are the debug options sufficient?
Grepping in 15 directories will not be fun
Yep, there will be a lot of directories beyond the Big Four. No doubt some combination of jupyter --paths
, jq
, and xargs
would make grep
plausible, but that's no fun!
A JupyterApp
base flag like --show-config
which every app would inherit is a whacking good idea, even outside of this little draft. It could probably use difflib
to generate a decently-readable representation of the config before each file was loaded, and show the final config, perhaps something like:
$> jupyter foo --show-config
environment variables:
- JUPYTER_PREFER_ENV_PATH: not set
- ...
paths:
- /etc/jupyter/jupyter_config.json: not found
...
- ~/my-project/src/my_project/etc/jupyter_foo_config.d/my-project.json:
+ SomeHasTraits:
+ foo: bar
...
- ~/my-project/src/my_project/.venv/etc/jupyter_config.d/someone-elses-project.json:
SomeHasTraits:
- foo: bar
+ foo: baz
...
- ./jupyter_foo_config.json: not found
final:
SomeHasTraits:
foo: baz
sprinkle in some pygments (if available) and it would be pretty usable.
Indeed, exactly what I had in mind, that would help a lot
Gah, looking at it: a lot of the complexity is duplicated between jupyter_server
and notebook
... while both would work with this PR, there's no simple way to add the above config inspection.
Perhaps the better short-term approach would be to invert it, with a separate package/command, e.g. offered jupyter show-config notebook FooHasTraits.bar
. I guess this would work by overloading/monkeypatching config_manager_class
(gaaah) with an instrumented subclass, and call initialize
but not start
.
Because of that complexity, this could probably not land here, unless the ConfigManager pattern was brought upstream, which sounds hard to coordinate.
I have an unshaeably bad version of this, but it kinda works with notebook
, jupyter_server
, jupyterlab
and voila
installed:
getting jupyter_server_config from /etc/jupyter
got {}
getting jupyter_server_config from /usr/local/etc/jupyter
got {}
getting jupyter_server_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/jupyterlab.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/nbclassic.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/voila.json
got {'ServerApp': {'jpserver_extensions': {'jupyterlab': True, 'nbclassic': True, 'voila.server_extension': True}}}
getting jupyter_server_config from /home/weg/.jupyter
got {}
getting page_config from /etc/jupyter/labconfig
got {}
getting page_config from /usr/local/etc/jupyter/labconfig
got {}
getting page_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/labconfig
got {}
getting page_config from /home/weg/.jupyter/labconfig
got {}
[I 2020-11-22 17:50:37.177 ServerApp] jupyterlab | extension was successfully linked.
getting jupyter_notebook_config from /home/weg/.jupyter
got {}
getting jupyter_notebook_config from /etc/jupyter
got {}
getting jupyter_notebook_config from /usr/local/etc/jupyter
got {}
getting jupyter_notebook_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/jupyterlab.json
Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/voila.json
got {'NotebookApp': {'nbserver_extensions': {'jupyterlab': True, 'voila.server_extension': True}}}
getting jupyter_notebook_config from /home/weg/.jupyter
got {}
[I 2020-11-22 17:50:37.322 ServerApp] nbclassic | extension was successfully linked.
[I 2020-11-22 17:50:37.322 ServerApp] voila.server_extension | extension was successfully linked.
[I 2020-11-22 17:50:37.339 LabApp] JupyterLab extension loaded from /home/weg/projects/jupyter_showconfig_/envs/default/lib/python3.7/site-packages/jupyterlab
[I 2020-11-22 17:50:37.339 LabApp] JupyterLab application directory is /home/weg/projects/jupyter_showconfig_/envs/default/share/jupyter/lab
[I 2020-11-22 17:50:37.342 ServerApp] jupyterlab | extension was successfully loaded.
[I 2020-11-22 17:50:37.345 ServerApp] nbclassic | extension was successfully loaded.
[I 2020-11-22 17:50:37.347 ServerApp] voila.server_extension | extension was successfully loaded.
Update: here's some better stuff, generated with rich
:
op ┃ section_name ┃ path ┃ old_value ┃ new_value
━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
stage │ │ │ │ before-init
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
patch │ │ io.open │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
patch │ │ BaseJSONConfigManager.get │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
stage │ │ │ │ before-constructor
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
stage │ │ │ │ after-constructor
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_server_config │ /etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_server_config │ /etc/jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_server_config │ /usr/local/etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_server_config │ /usr/local/etc/jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_server_config │ $SYS_PREFIX/etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
open │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d │ jupyterlab.json │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
open │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d │ nbclassic.json │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
open │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d │ voila.json │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_server_config │ $SYS_PREFIX/etc/jupyter │ │ {
│ │ │ │ "ServerApp": {
│ │ │ │ "jpserver_extensions": {
│ │ │ │ "jupyterlab": true,
│ │ │ │ "nbclassic": true,
│ │ │ │ "voila.server_extension": true
│ │ │ │ }
│ │ │ │ }
│ │ │ │ }
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_server_config │ $HOME/.jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_server_config │ $HOME/.jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ page_config │ /etc/jupyter/labconfig │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ page_config │ /etc/jupyter/labconfig │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ page_config │ /usr/local/etc/jupyter/labconfig │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ page_config │ /usr/local/etc/jupyter/labconfig │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ page_config │ $SYS_PREFIX/etc/jupyter/labconfig │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ page_config │ $SYS_PREFIX/etc/jupyter/labconfig │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ page_config │ $HOME/.jupyter/labconfig │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ page_config │ $HOME/.jupyter/labconfig │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_notebook_config │ $HOME/.jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_notebook_config │ $HOME/.jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_notebook_config │ /etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_notebook_config │ /etc/jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_notebook_config │ /usr/local/etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_notebook_config │ /usr/local/etc/jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_notebook_config │ $SYS_PREFIX/etc/jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
open │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ jupyterlab.json │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
open │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ voila.json │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_notebook_config │ $SYS_PREFIX/etc/jupyter │ │ {
│ │ │ │ "NotebookApp": {
│ │ │ │ "nbserver_extensions": {
│ │ │ │ "jupyterlab": true,
│ │ │ │ "voila.server_extension": true
│ │ │ │ }
│ │ │ │ }
│ │ │ │ }
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
get │ jupyter_notebook_config │ $HOME/.jupyter │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
got │ jupyter_notebook_config │ $HOME/.jupyter │ │ {}
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
change │ kernel_spec_manager │ ServerApp │ │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
change │ ssl_options │ ServerApp │ {} │
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
stage │ │ │ │ after-init
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
stage │ │ │ │ started
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
stage │ │ │ │ done
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/how-do-we-uninstall-extensions-that-have-been-installed-using-jupyter-labextension-develop-overwrite/7845/5
@bollwyvl - I made a PR to your PR with a few changes I thought would be good: https://github.com/bollwyvl/jupyter_core/pull/1. What do you think?
@jasongrout thanks for that! I merged in upstream, added a flit example and some tests.
Overall, I think this is a good idea. My immediate usecase is using this for installing prebuilt extensions, and originally I was thinking that we just enable an entry point for jlab plugins specifically. I like that this solution is far more general and more foundational than a jlab-specific entry point!
Yeah, federated extensions are a big motivator: as mentioned, the data_files
approach yields duplicates in wheels, which really starts adding up on large builds (wasm, design templates, etc).
I really like how dense the flit
example is... there's sane globbing/excluding, and the reproducible wheel building is a big step forward. I would move all of my labextensions to that toolchain as soon as this was available.
@bollwyvl - do you mind if I edit the original description to provide an overview of this PR and its impact on how things work in Jupyter?
Free copy editing!? This place has everything!
Some totally unscientific timings with the two tiny example packages:
From the CLI:
Init jupyter_config_paths in 3.5849ms
Init jupyter_config_paths:entry-point-example-setuptools in 0.1771ms
Init jupyter_config_paths:entry-point-example-flit in 0.1349ms
Init jupyter_data_paths in 3.1528ms
Init jupyter_data_paths:entry-point-example-setuptools in 0.0212ms
Init jupyter_data_paths:entry-point-example-flit in 0.0160ms
From an interactive python
session (not ipython):
Init jupyter_data_paths in 8.7347ms
Init jupyter_data_paths:entry-point-example-setuptools in 0.2825ms
Init jupyter_data_paths:entry-point-example-flit in 0.1893ms
Init jupyter_config_paths in 5.3611ms
Init jupyter_config_paths:entry-point-example-setuptools in 0.0370ms
Init jupyter_config_paths:entry-point-example-flit in 0.0262ms
I'd wager the baseline init of finding all the entry_points is static, and will increase linearly with the number of entry_points loaded. Subsequent calls take the same amount of time.
If, as is likely, the two entry point targets are both loaded from __init__.py
, the load import cost will only be paid the first time, as everything will be cached in the import machinery.
However, if each GET
request to a Jupyter server requires warming these up, this will add up rapidly. So we'd need to decide where it would be appropriate to control caching, such as (not mutually exclusive):
- a sensible, environment-variable override-able default cache period (e.g. a minute)
- offer a
force=False
flag for cache invalidation- e.g. an app's
index.html
-equivalent handler could invalidate the cache when it starts a template render, but subsequent requests for images/js/css would not
- e.g. an app's
However, if each
GET
request to a Jupyter server requires warming these up, this will add up rapidly. So we'd need to decide where it would be appropriate to control caching, such as (not mutually exclusive):
For loading JLab extensions (IIRC), we cache the information once at application startup.
Welp, if you're on binder, and "pip install" a widget library, you'd refresh the page... And not get your widgets. But your kernel-side stuff would work.
I ran into a similar issue in the licenses pr (also the federated extension info appears to be in a closure).
Here's another crazy idea that works around the problem of entry points always importing a package's __init__.py
, which may be expensive:
Using https://setuptools.readthedocs.io/en/latest/userguide/extension.html#adding-new-egg-info-files as a basis, I use custom package metadata. My changes are in this commit: https://github.com/jasongrout/jupyter_core/commit/66351b0978f21b275dd58ab13e6448ef3c610705. Since I'm poking around in setuptools, which I don't know much about, I'm sure the code could be cleaned up or made more general by someone more familiar with setuptools.
The tradeoff is that the paths are more declarative, and I assume that paths are relative to the package root, rather than being able to compute them on the fly like with entry points.
Thoughts about using custom package metadata vs entry points that require importing the package?
Patch copied here for completeness and archiving
Use custom package metadata for augmenting Jupyter paths.
This does not require packages to be imported in order to get the Jupyter paths, which are potentially costly steps. Instead, we rely strictly on scanning and getting package metadata.
---
.../setup.cfg | 8 +++++
.../__init__.py | 6 ++--
jupyter_core/paths.py | 31 +++++++++++++++----
jupyter_core/utils/__init__.py | 27 +++++++++++++++-
setup.cfg | 6 ++++
5 files changed, 68 insertions(+), 10 deletions(-)
diff --git a/examples/jupyter_path_entrypoint_setuptools/setup.cfg b/examples/jupyter_path_entrypoint_setuptools/setup.cfg
index 71239d8..4dbecec 100644
--- a/examples/jupyter_path_entrypoint_setuptools/setup.cfg
+++ b/examples/jupyter_path_entrypoint_setuptools/setup.cfg
@@ -25,9 +25,17 @@ include_package_data = True
zip_safe = False
python_requires = >=3.6
+setup_requires =
+ jupyter_core
install_requires =
jupyter_core
+# Jupyter directories are relative to the package root
+jupyter_config_paths =
+ etc/jupyter
+ etc/another/jupyter
+jupyter_data_paths = share/jupyter
+
[options.packages.find]
where =
src
diff --git a/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py b/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
index 64013c8..dc836c5 100644
--- a/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
+++ b/examples/jupyter_path_entrypoint_setuptools/src/entry_point_example_setuptools/__init__.py
@@ -3,7 +3,7 @@
__version__ = "0.1.0"
-HERE = os.path.abspath(os.path.dirname(__file__))
+# HERE = os.path.abspath(os.path.dirname(__file__))
-JUPYTER_CONFIG_PATHS = [os.path.join(HERE, "etc", "jupyter")]
-JUPYTER_DATA_PATHS = [os.path.join(HERE, "share", "jupyter")]
+# JUPYTER_CONFIG_PATHS = [os.path.join(HERE, "etc", "jupyter")]
+# JUPYTER_DATA_PATHS = [os.path.join(HERE, "share", "jupyter")]
diff --git a/jupyter_core/paths.py b/jupyter_core/paths.py
index 4ec3668..7d8b311 100644
--- a/jupyter_core/paths.py
+++ b/jupyter_core/paths.py
@@ -19,6 +19,7 @@
from contextlib import contextmanager
import entrypoints
+import pkg_resources
pjoin = os.path.join
@@ -49,6 +50,18 @@ def _entry_point_paths(ep_group):
))
return paths
+def _package_metadata(group):
+ """Load extra jupyter paths from custom package metadata
+ """
+ paths = []
+ filename = f'{group}.txt'
+ for distribution in sorted(pkg_resources.working_set, key=lambda x: x.key):
+ if distribution.has_metadata(filename) and distribution.has_metadata('top_level.txt'):
+ top_level = list(distribution.get_metadata_lines('top_level.txt'))[0]
+ localpaths = [f'{top_level}/{p}' for p in distribution.get_metadata_lines(filename)]
+ paths.extend(distribution.get_resource_filename(distribution, p) for p in localpaths if distribution.resource_isdir(p))
+ return paths
+
def envset(name):
"""Return True if the given environment variable is set
@@ -187,16 +200,19 @@ def jupyter_path(*subdirs):
# Next is environment or user, depending on the JUPYTER_PREFER_ENV_PATH flag
user = jupyter_data_dir()
env = [p for p in ENV_JUPYTER_PATH if p not in SYSTEM_JUPYTER_PATH]
- entry_points = [p for p in _entry_point_paths(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
+ # entry_points = [p for p in _entry_point_paths(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
+ package_metadata = [p for p in _package_metadata(JUPYTER_DATA_PATH_ENTRY_POINT) if p not in SYSTEM_JUPYTER_PATH]
if envset('JUPYTER_PREFER_ENV_PATH'):
paths.extend(env)
- paths.extend(entry_points)
+ # paths.extend(entry_points)
+ paths.extend(package_metadata)
paths.append(user)
else:
paths.append(user)
paths.extend(env)
- paths.extend(entry_points)
+ # paths.extend(entry_points)
+ paths.extend(package_metadata)
# finally, system
paths.extend(SYSTEM_JUPYTER_PATH)
@@ -244,16 +260,19 @@ def jupyter_config_path():
# Next is environment or user, depending on the JUPYTER_PREFER_ENV_PATH flag
user = jupyter_config_dir()
env = [p for p in ENV_CONFIG_PATH if p not in SYSTEM_CONFIG_PATH]
- entry_points = [p for p in _entry_point_paths(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
+ # entry_points = [p for p in _entry_point_paths(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
+ package_metadata = [p for p in _package_metadata(JUPYTER_CONFIG_PATH_ENTRY_POINT) if p not in SYSTEM_CONFIG_PATH]
if envset('JUPYTER_PREFER_ENV_PATH'):
paths.extend(env)
- paths.extend(entry_points)
+ # paths.extend(entry_points)
+ paths.extend(package_metadata)
paths.append(user)
else:
paths.append(user)
paths.extend(env)
- paths.extend(entry_points)
+ # paths.extend(entry_points)
+ paths.extend(package_metadata)
# Finally, system path
paths.extend(SYSTEM_CONFIG_PATH)
diff --git a/jupyter_core/utils/__init__.py b/jupyter_core/utils/__init__.py
index 6ef6d5c..43466e7 100644
--- a/jupyter_core/utils/__init__.py
+++ b/jupyter_core/utils/__init__.py
@@ -13,4 +13,29 @@ def ensure_dir_exists(path, mode=0o777):
if e.errno != errno.EEXIST:
raise
if not os.path.isdir(path):
- raise IOError("%r exists but is not a directory" % path)
\ No newline at end of file
+ raise IOError("%r exists but is not a directory" % path)
+
+# from setuptools.config.ConfigHandler
+def _parse_list(value, separator=','):
+ """Represents value as a list.
+ Value is split either by separator (defaults to comma) or by lines.
+ :param value:
+ :param separator: List items separator character.
+ :rtype: list
+ """
+ if isinstance(value, list): # _get_parser_compound case
+ return value
+
+ if '\n' in value:
+ value = value.splitlines()
+ else:
+ value = value.split(separator)
+
+ return [chunk.strip() for chunk in value if chunk.strip()]
+
+def write_arg_list(cmd, basename, filename):
+ argname = os.path.splitext(basename)[0]
+ value = getattr(cmd.distribution, argname, None)
+ if value is not None:
+ value = "\n".join(_parse_list(value)) + "\n"
+ cmd.write_or_delete_file(argname, filename, value)
diff --git a/setup.cfg b/setup.cfg
index a065370..729c719 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -38,3 +38,9 @@ console_scripts =
jupyter = jupyter_core.command:main
jupyter-migrate = jupyter_core.migrate:main
jupyter-troubleshoot = jupyter_core.troubleshoot:main
+distutils.setup_keywords =
+ jupyter_config_paths = setuptools.dist:assert_string_list
+ jupyter_data_paths = setuptools.dist:assert_string_list
+egg_info.writers =
+ jupyter_config_paths.txt = jupyter_core.utils:write_arg_list
+ jupyter_data_paths.txt = jupyter_core.utils:write_arg_list
custom package metadata.
My issue is: I think if we're going to do something special for python packages (as opposed to julia, r, etc. to which this information will be pretty much opaque), my preference would be to stay as close to the spec (PEP 517, 639, ...) as possible, in a format that as many tools as possible support. Introducing new metadata fields, that can only be feasibly created with jupyter_packaging
, will disincentivize maintainers creating packages that "casually" work with Jupyter tools.
I think with
- sound documentation (that could be under
jupyter_packaging
, i suppose) - sensibly defaulted caching
- providing profiling information through some means
- guidance on making the packages that own these dedicated modules (as in, shows up in
top_level.txt
), with (almost no imports)
...entry_points
is a fine choice, and are already used/documented in nbconvert
and elsewhere.
And, back to the lab extensions point: I might just be thinking about some ability to dynamically generate federated extensions without nodejs, much less jupyter_packaging
, and making this entry_point
target a singleton with __slice__
method...
custom package metadata.
But thank you @jasongrout for providing additional options. Please feel free to hoist that idea up to the description! I took [strawman]
off, but the intent of the original issue (https://github.com/jupyter-server/jupyter_server/issues/351) still stands... I'm always sad when a snap decision gets made just because someone needs it/is getting paid for it today or whatever, rather than we think this is good for all Jupyter stakeholders, from the janitors to the hot-shot spaceship pilots.
Are these two statements referring to the same concept?
because the import system is invoked, users of this system may wish to create a separate
python_packages
entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g.pandas
guidance on making the packages that own these dedicated modules (as in, shows up in
top_level.txt
), with (almost no imports)
If so, can you elaborate? Is there a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?
Introducing new metadata fields, that can only be feasibly created with
jupyter_packaging
, will disincentivize maintainers creating packages that "casually" work with Jupyter tools.
Note that the custom metadata here only relies on jupyter_core as a dependency, not jupyter_packaging. The example was updated to show that all a user would have to do is add one or two arguments to their setup.py file (or equivalent entries in setup.cfg), and have a setup_requires point to jupyter_core.
If so, can you elaborate? Is there a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?
For completeness: I also explored a little bit using namespace packages to try to avoid importing the main package to get to a lightweight entry point, but gave it up as being too complicated/magical to recommend to everyone.
my preference would be to stay as close to the spec (PEP 517, 639, ...) as possible, in a format that as many tools as possible support.
If other packaging tools do not support custom metadata, then that would be a showstopper for using custom metadata, I think.
I agree the entry points would be better theoretically, it just bothers me that you have to import the package to get at this static configuration data.
a way to have a package that may have a relatively expensive top-level import, but still have a relatively lightweight entry point by having the entry point not requiring the expensive import?
to get slightly more precise on python terminology, a module can't do it, but a distribution can. consider:
src/
my_widget/
__init__.py # imports pandas, ipywidgets, the kitchen and bathroom sink
share/
my_widget_config/
__init__.py # imports nothing but pathlib, points at paths of ../my_widget/share
then find_packages(include="src")
would turn up my_widget
and my_widget_config
. People (inadvertently) do this all the time with tests
, which means all of them are broken.
Of course, with flit
, this wouldn't work, as it only does one package at a time. poetry
and setuptools
could do it declaratively, though. but really... i'm coming more around to thinking that, at least in the case of big-ol'-lab-assets, they should be separate packages anyway, so that it's possible to support multiple versions of lab.
I'll add that to the setuptools example.
namespace packages
Ooh, yeah, magic names are.... not fun in the slightest. And namespace packages are vicious. I tolerate it on wxyz, but woe betide anyone else that wants to collaborate on that without coordinating.
but a distribution can.... then
find_packages(include="src")
would turn upmy_widget
andmy_widget_config
In this case, would we have:
- A single tarball/wheel distributed via pypi, or multiple tarball/wheel files?
- two top-level imports in python, i.e.,
import my_widget
andimport my_widget_config
both would work? - two top-level directories in site-packages, or one?
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/jupyter-paths-priority-order/7771/3
in the simplest case:
- a single
tar.gz
, a single.whl
- two top-level imports
- two top-level directories in
site-packages
again, when folk use find_packages
today, this happens all the time with a site-packages/tests
. Even jupyter_server
did this for a while! But we can use the heck out of it. These things won't be zip_safe
... but they never were.
ancedotally, on jupyterlab-lsp, we opted to split them, and jupyterlab-lsp
has a dependency on jupyter-lsp
, as it would work with any jupyter_server
and not bring in jlab-specific deps like json5
.
this happens all the time with a site-packages/tests
Very interesting! Does pip not give any warning when a package installation overwrites an existing directory, i.e., two different packages stomp on each other?
If other packaging systems support custom metadata, I think that's probably cleaner than having a separate top-level directory and top-level import just for an entry point, i.e., you only have one top-level import/directory. On the other hand, maybe that extra package is only done in situations where the original package import is expensive, and maybe that's not so common. On the other hand, maybe that second package contains the actual etc/ and share/ directories as well (e.g., that second package is what distributes the javascript assets for a lab extension) - perhaps it is a bit iffy to assume you can do ../mypackage/etc/jupyter
to point to the first package from the second package?
If other packaging systems support custom metadata
I haven't looked into it... i loathe making PRs to packaging systems to support stuff. I mean, the ground is littered with old Please make AMD module
PRs to have stuff work with requirejs
, plz publish language server on npm
for LSP... but at least these made things better for all their downstreams. If custom metadata is not wide-spread (and/or doesn't has a formal PEP), i just dunno... it sure sounds like rewriting entry_points
, But For Jupyter... which is my point in this whole exercise.
one top-level import/directory.
Welp... most folk don't want to tell people to import my_widget.widgets_foo
. You can do magic __getattr__
and stuff to make them lazy, though. But if that widgets_foo.py
imports pandas
, then were back at the initial expensive load.
Does pip not give any warning
Nope, otherwise old-style namespace packages like backports.*
wouldn't work, but conda
, apt
, etc. sure do, and have to patch around these things! pip
also ignores extras
pins after the fact, yadda, yadda... it's not great, but it's what we have to deal with. We do not what, A Package Manager, But For Jupyter (though I'll stand by jlpm
being a good call :blush:).
second package contains the actual etc/ and share/ directories as well
Yeah, that pattern that could work... certainly for etc
. None of the _jupyter_this_and_that_extension
magic functions need any imports.
But then, many of those enable --py
, etc. won't even be needed for something that uses these tools, just pip install -e
(or flint install --symlink
).
With federated extensions, folks are reading more JSON than before, which is good for DRY, and that isn't free, but usually that's a one-time hit.
bit iffy to assume you can do ../mypackage/etc/jupyter to point to the first package from the second package?
I mean, that's what you buy when you publish packages that end up in site-packages
. But sure. Especially with --symlink
, stuff gets weird.
i loathe making PRs to packaging systems to support stuff.
Yep, agreed. If other packaging systems don't support the custom metadata, I think the option is out.