repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

Treat jupyter as an editor (decouple jupyter and user environments)

Open mdeff opened this issue 4 years ago • 7 comments

I think jupyter should be treated as an editor, decoupled from user code and environment. As a start, it could be installed in its own environment, out of the environment where the user code is run. (This is true in general: desktop users are better off with a system-installed jupyter and kernelspecs pointing to environments.)

The main issue with the current setup is to run ancient code with a modern jupyter, as dependencies might conflict. That is especially true when their versions are specified by the user (in requirements.txt, environment.yml, etc.) for reproducibility's sake. I have in mind scientific experiments which, for reproducibility, shouldn't be updated (unlike living projects).

There is however a deeper issue, as ipykernel itself has many dependencies to be installed in the user environment. Longer term, it would be good to decouple ipykernel and jupyter more (to execute code with an old ipykernel specified by user environment with a modern jupyter editor). Even better would be to push the interface down to the notebook file to entirely decouple the editor from the user environment. After all, if jupyter is an editor, running notebooks should not require to install anything in the user environment.

This is potentially a far-fetched and long-term issue that I have little idea how to technically realize, if even possible. Please let me know your thoughts and if it has been discussed (elsewhere) already. As always, thanks for your amazing work!

mdeff avatar Mar 27 '20 17:03 mdeff

We need to do a bit of digging but there should be a few issues already around splitting the environment of the kernel and the environment of the notebook server. This is what we do for Python 2 already. The kernel runs in a environment with Python 2, but the notebook server (and other r2d infrastructure) runs on Python 3.

We have talked about adopting that approach for Python in general. One immediate hurdle is that you now need two files. One to specify dependencies for your kernel and one for the notebook server (for example to change the Jupyter Lab version or install a extension). How to solve this nicely is a problem for which we need a few attempts (I think).

Someone who wants to work on this or explore options via code examples would be super welcome.


As a general point we will continue to ship Jupyter as the default UI with repo2docker. We need some form of default UI that can be accessed over the web. Jupyter seems like a good fit for that. We also have a bit of infrastructure already to proxy other UIs. There are even examples like https://github.com/danlester/binderhub-voila-direct and/or https://github.com/danlester/binderhub-voila-native that run without installing Jupyter (well, they install voila which is Jupyter but ... they show how you could run something else :) ). So I think repo2docker will continue with shipping Jupyter as the UI.

betatim avatar Mar 27 '20 17:03 betatim

I didn't know it already worked this way for python 2. Great! It shouldn't be too much of a hurdle to do it for python3 then. What's the situation on non-python kernels? All the better if it unifies operations across kernels.

The notebook server environment could be specified in a .binder/requirements_ui.txt. It also makes sense from a reproducibility point-of-view, as you might want to update your UI (or use r2d's default) while preserving the experiment code and environment.

Completely agree with shipping Jupyter as the default UI. My point in general is to restrain from "polluting" (or altering) the user code environment.

mdeff avatar Mar 27 '20 21:03 mdeff

Some thoughts about a potential shorter-term fix. It's possible to pin the base environment (specified in repo2docker/buildpacks/conda/environment.yml) by pinning those packages (and their dependencies) in the repo's environment.yml or requirements.txt. But what if some packages shouldn't be installed (like jupyterlab on ancient environments)? Or if newer versions of repo2docker add packages there?

While ipykernel needs to alter the user environment through dependencies, separating the user and editor environments will only solve part of the issue. In the meantime, we need a way to pin the editor environment that is injected in the user environment.

mdeff avatar Jun 15 '20 18:06 mdeff

For reference, I ended up achieving the desired independence by creating a venv from postBuild, requiring the python version in runtime.txt, and defering to the default conda env for the jupyter UI.

# .binder/postBuild
python3.6 -m venv ./env
./env/bin/pip install -r requirements.txt
# Shadow the default kernelspec for jupyter to use our environment by default.
./env/bin/python -m ipykernel install --user

The problem for https://github.com/mdeff/fma is that the old computational environment (that I want to preserve for reproducibility) is not compatible with any jupyterlab (required by r2d's default conda env). I like this solution because my users can run the frozen env from the cloud in the latest jupyterlab! Happy to make a binder-example if you think that's a good (even if temporary) solution.

mdeff avatar Jun 18 '20 12:06 mdeff

Cross-referencing related issues:

  • Default urlpath/filepath configurable from a file in repo? #369
  • Fresh Python kernels for reliable builds. #741

manics avatar Sep 18 '20 16:09 manics

Another use-case: Disabling the mybinder.org jitsi extension https://github.com/jupyterhub/mybinder.org-deploy/issues/1562#issuecomment-700432663

manics avatar Sep 29 '20 11:09 manics

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/previous-built-binder-repo-suddenly-with-404-error/13047/5

meeseeksmachine avatar Feb 16 '22 19:02 meeseeksmachine