jupyter_server
jupyter_server copied to clipboard
RFP for successor to data_files-based extension discovery?
@blink1073 I'm saying support both. I personally would be happy to never use data_files again unless their level of support changes. - https://github.com/jupyter-server/jupyter_server/issues/224#issuecomment-632730340
The threshold of "change" is met, but not in not the way suggested, and not looking better in 2021, with changes coming to pip, setuptools, etc.
Maybe it's time to open something akin to a requests for proposals for ways forward?
It would appear the viability is falling of data_files
as a way for python projects to ship extension assets, e.g. js/css, kernelspecs, and configuration, e.g. jupyter_config. I think we need to think about some ways that we can appease:
- ease of distribution of core jupyter packages' assets
- ease of installation of the "official" jupyter packages (which I guess is a python sdist/whl)
- ease of re-distribution via "unixy" package managers (e.g.
conda
,brew
,apt
), etc. as that may be able to preserve some of the current end user experience
I've some ideas, but would love to hear out some more thoughts! Oh, and if this belongs somewhere else, please let me know... I'm sure at some point this will end up having to have a JEP-level clarification, but...
I think that data files is still the way to go, as it provides a very clear API to manage content under PREFIX/share
and PREFIX/etc
.
I'm wondering what the issue is, did something change recently? I don't see data_files disappearing (it would break the python ecosystem). If dev installs are a problem, I think we can solve that.
(from mobile phone)
On Sat, Nov 21, 2020, 07:22 Sylvain Corlay [email protected] wrote:
I think that data files is still the way to go, as it provides a very clear API to manage content under PREFIX/share and PREFIX/etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jupyter-server/jupyter_server/issues/351#issuecomment-731516474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPEPJQTOLKMZJPHZ2423DSQ5MARANCNFSM4T5RLACQ .
This really only affects the mainline python packaging/installation... conda, etc. don't care at all about these issues, and can continue to be shipped how they do. And certainly other languages need to be able to rely on share/etc. But getting down to it, we're in python land, and are pretty much ruled by whatever comes down the pipe.
No doubt users will be able to continue to install artifacts that include data_files... for now. but creating the stuff is getting harder. And pinning to older "pip" or even "setuptools" doesn't seem like a great option if one of them drops it, or requires an end user flag.
My concern is that with the pace of breakage in the mainline tools (e.g new pip solver vs extras) is going to make it difficult to make it easy for packages to casually offer jupyter integration... Everything's going to have to be a "jupyter-" this and "jupyter_" that because jupyter packages will look so different from what official documentation suggests.
For example, taking a peep at the suggested cookiecutter setup for lab3 extensions seems very complex, when without data_files we could likely encourage a single pyproject.toml (or setup.cfg+1 line setup.py) and be done with it... and get better metadata, e.g. data_files didn't make the cut on pep 621: https://www.python.org/dev/peps/pep-0621/
This draft PR shows one way forward which would be compatible with just about everything, require very few downstream changes, and only uses PEP 621-compliant meta data.
I like it Nick, indeed I agree with many points you bring up, and it's making me a bit sad(I see data_files as super duper fundamental), but your solution makes me a bit happier. I'll comment more in your pr.
Still feeling the pain on this.
Turns out having files in-tree and in data_files leads to them being shipped twice. For "a little python" or whatever, this is no big deal. For shipping ipydrawio, however, which is a full data-drive design tool, the whl
is currently sitting just under 70mb, and expands to ~200mb (lots of un-compressed XML, twice). The sdist is 30mb, because tar.gz
is apparently smarter than whl
, but still unpacks to the same size.
Does appdirs
apply here? Looks like Sublime is using it for packages.
:tada: flit
might soon get support for data_files
: https://github.com/pypa/flit/pull/510
The approach looks like a single data root, so a nominal jupyter-extending package might be like:
data/
share/
jupyter/
etc/
jupyter/
src/
kitchen_sink/
__init__.py
pyproject.toml
...and single line in pyproject.toml
would ensure all those files get deployed correctly. Big win.
Hmm, it seems like at that point we'd be better off wrapping flit to add a build step in jupyter-packaging. And server extensions with no build step could just use flit directly.
Closing this, since we've settled on using shared_data
from hatch
.
Yes, we needn't change anything on this repo (or jupyter_core
), as kernel (see below), extension, and other tool authors today have the option of declaring this in pyproject.toml
for any number of PEP 517 build backends:
-
tool.hatch.build.targets.wheel.shared-data
without support caveat -
tool.flit.external-data
without support caveat -
tool.setuptools.data-files
, though marked Discouraged... but not Deprecated
A cursory check reveals poetry
and maturin
still lack this feature... the former bothers me not one bit, but the latter could eventually become a concern.
Perhaps we can dream of a future where PEP XXX: Prefix Data (as 621 has disowned this problem) clarifies this so it can move into a single pyproject.toml#project
field (e.g. project.prefix-data
) with defined --editable
behavior, instead of 10 different things with different data models. :sleeping: :cloud:
Aside: about kernelspecs
On a partial tangent, regarding kernelspecs
: jupyter_*
(specifically client
, perhaps) could improve the situation for reproducible, minimal distributions. Specifically, selecting data formats/syntaxes that are more cross-platform, and therefore tolerant to string replacement, would help. The worst case is JSON kernelspec files with respect to paths, especially on windows, which have been a long-standing source of problems.
In light of the above:
- use more normalized URIs to avoid windows paths, e.g.
file:///c:/prefix-placeholder
-
TOML might also be a reasonable format, as it supports python-style triple (single) quotes, e.g.
'''