jupyter_server RFP for successor to data_files-based extension discovery?

@blink1073 I'm saying support both. I personally would be happy to never use data_files again unless their level of support changes. - https://github.com/jupyter-server/jupyter_server/issues/224#issuecomment-632730340

The threshold of "change" is met, but not in not the way suggested, and not looking better in 2021, with changes coming to pip, setuptools, etc.

Maybe it's time to open something akin to a requests for proposals for ways forward?

It would appear the viability is falling of data_files as a way for python projects to ship extension assets, e.g. js/css, kernelspecs, and configuration, e.g. jupyter_config. I think we need to think about some ways that we can appease:

ease of distribution of core jupyter packages' assets
ease of installation of the "official" jupyter packages (which I guess is a python sdist/whl)
ease of re-distribution via "unixy" package managers (e.g. conda, brew, apt), etc. as that may be able to preserve some of the current end user experience

I've some ideas, but would love to hear out some more thoughts! Oh, and if this belongs somewhere else, please let me know... I'm sure at some point this will end up having to have a JEP-level clarification, but...

Nov 21 '20 04:11 bollwyvl

I think that data files is still the way to go, as it provides a very clear API to manage content under PREFIX/share and PREFIX/etc.

Nov 21 '20 06:11 SylvainCorlay

I'm wondering what the issue is, did something change recently? I don't see data_files disappearing (it would break the python ecosystem). If dev installs are a problem, I think we can solve that.

(from mobile phone)

On Sat, Nov 21, 2020, 07:22 Sylvain Corlay [email protected] wrote:

I think that data files is still the way to go, as it provides a very clear API to manage content under PREFIX/share and PREFIX/etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jupyter-server/jupyter_server/issues/351#issuecomment-731516474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPEPJQTOLKMZJPHZ2423DSQ5MARANCNFSM4T5RLACQ .

Nov 21 '20 06:11 maartenbreddels

This really only affects the mainline python packaging/installation... conda, etc. don't care at all about these issues, and can continue to be shipped how they do. And certainly other languages need to be able to rely on share/etc. But getting down to it, we're in python land, and are pretty much ruled by whatever comes down the pipe.

No doubt users will be able to continue to install artifacts that include data_files... for now. but creating the stuff is getting harder. And pinning to older "pip" or even "setuptools" doesn't seem like a great option if one of them drops it, or requires an end user flag.

My concern is that with the pace of breakage in the mainline tools (e.g new pip solver vs extras) is going to make it difficult to make it easy for packages to casually offer jupyter integration... Everything's going to have to be a "jupyter-" this and "jupyter_" that because jupyter packages will look so different from what official documentation suggests.

For example, taking a peep at the suggested cookiecutter setup for lab3 extensions seems very complex, when without data_files we could likely encourage a single pyproject.toml (or setup.cfg+1 line setup.py) and be done with it... and get better metadata, e.g. data_files didn't make the cut on pep 621: https://www.python.org/dev/peps/pep-0621/

Nov 21 '20 13:11 bollwyvl

This draft PR shows one way forward which would be compatible with just about everything, require very few downstream changes, and only uses PEP 621-compliant meta data.

Nov 21 '20 21:11 bollwyvl

I like it Nick, indeed I agree with many points you bring up, and it's making me a bit sad(I see data_files as super duper fundamental), but your solution makes me a bit happier. I'll comment more in your pr.

Nov 21 '20 21:11 maartenbreddels

Still feeling the pain on this.

Turns out having files in-tree and in data_files leads to them being shipped twice. For "a little python" or whatever, this is no big deal. For shipping ipydrawio, however, which is a full data-drive design tool, the whl is currently sitting just under 70mb, and expands to ~200mb (lots of un-compressed XML, twice). The sdist is 30mb, because tar.gz is apparently smarter than whl, but still unpacks to the same size.

Jan 30 '21 17:01 bollwyvl

Does appdirs apply here? Looks like Sublime is using it for packages.

Mar 10 '21 17:03 layne-sadler

:tada: flit might soon get support for data_files: https://github.com/pypa/flit/pull/510

The approach looks like a single data root, so a nominal jupyter-extending package might be like:

data/
  share/
    jupyter/
  etc/
    jupyter/
src/
  kitchen_sink/
    __init__.py
pyproject.toml

...and single line in pyproject.toml would ensure all those files get deployed correctly. Big win.

Jan 15 '22 18:01 bollwyvl

Hmm, it seems like at that point we'd be better off wrapping flit to add a build step in jupyter-packaging. And server extensions with no build step could just use flit directly.

Jan 15 '22 22:01 blink1073

Closing this, since we've settled on using shared_data from hatch.

Jan 08 '23 20:01 blink1073

Yes, we needn't change anything on this repo (or jupyter_core), as kernel (see below), extension, and other tool authors today have the option of declaring this in pyproject.toml for any number of PEP 517 build backends:

tool.hatch.build.targets.wheel.shared-data without support caveat
tool.flit.external-data without support caveat
tool.setuptools.data-files, though marked Discouraged... but not Deprecated

A cursory check reveals poetry and maturin still lack this feature... the former bothers me not one bit, but the latter could eventually become a concern.

Perhaps we can dream of a future where PEP XXX: Prefix Data (as 621 has disowned this problem) clarifies this so it can move into a single pyproject.toml#project field (e.g. project.prefix-data) with defined --editable behavior, instead of 10 different things with different data models. :sleeping: :cloud:

Aside: about kernelspecs

On a partial tangent, regarding kernelspecs: jupyter_* (specifically client, perhaps) could improve the situation for reproducible, minimal distributions. Specifically, selecting data formats/syntaxes that are more cross-platform, and therefore tolerant to string replacement, would help. The worst case is JSON kernelspec files with respect to paths, especially on windows, which have been a long-standing source of problems.

In light of the above:

use more normalized URIs to avoid windows paths, e.g. file:///c:/prefix-placeholder
TOML might also be a reasonable format, as it supports python-style triple (single) quotes, e.g. '''

Jan 08 '23 23:01 bollwyvl

jupyter_server jupyter_server copied to clipboard

RFP for successor to data_files-based extension discovery?

Aside: about kernelspecs

jupyter_server
jupyter_server copied to clipboard