binderhub icon indicating copy to clipboard operation
binderhub copied to clipboard

Option to launch a binder directly from a dockerhub image (bypass repo2docker completely)

Open rabernat opened this issue 3 years ago • 13 comments

Proposed change

In Pangeo, we use CI to build complex docker images with our full stack in https://github.com/pangeo-data/pangeo-docker-images. These images are used directly in various Pangeo JupyterHubs.

We also want to use the same images in binder. We nearly always use use the nbgitpuller trick to use separate repos for the binder env and contents. Currently this requires making a "passthrough" repo with a single-line Dockerfile pointing at the desired image on Dockerhub, e.g.: https://github.com/pangeo-gallery/default-binder/blob/master/binder/Dockerfile

Maintaining this "passthrough" repo is an extra step that leads to unnecessary complexity and also wastes binder resources rebuilding docker containers that are unchanged from the dockerhub version.

I would love to have an option to launch a binder directly from a dockerhub (or other container registry) image, completely bypassing repo2docker.

Alternative options

Just keep doing what we are doing now, which works fine but requires additional complexity.

Who would use this feature?

Pangeo Gallery and the entire Pangeo project would use this feature heavily. More generally, this feature would help bridge the gap between cloud-based JupyterHubs using pre-built docker images and Binders, improving interoperability between environments. It would make it trivial to launch a binder with an identical environment to a cloud-based JupyterHub, without requiring users to mess around with Dockerfiles.

(Optional): Suggest a solution

I don't know enough about how binderhub works to propose an implementation.

rabernat avatar May 14 '21 15:05 rabernat

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar May 14 '21 15:05 welcome[bot]

We probably need to add some code here to check if the image exists in the registry and mark it as found if so.

yuvipanda avatar May 14 '21 15:05 yuvipanda

I brought this up in a separate thread https://github.com/jupyterhub/mybinder.org-deploy/issues/1474#issuecomment-649769488, probably should opened a discourse forum post but never got around to it. In any case, wanted to link to some relevant discussion from the past.

scottyhq avatar May 14 '21 23:05 scottyhq

What about having something like binderhub.example.org/v2/dockerhub/docker-org/image/tag as launch URLs for this kind of thing?

betatim avatar May 17 '21 06:05 betatim

What about having something like binderhub.example.org/v2/dockerhub/docker-org/image/tag as launch URLs for this kind of thing?

Conceptually this sounds like a new content-provider, especially as tag can be the equivalent of a git branch that's updated so you';d need to decide whether or not to check for an updated image. For efficiency it'd be nice to bypass repodocker, but as a proof-of-concept having repo2docker pull and push the image might be feasible?

Do you think it should be hardcoded to dockerhub, or something like docker with support for all public docker registries?

manics avatar May 17 '21 09:05 manics

Do you think it should be hardcoded to dockerhub, or something like docker with support for all public docker registries?

Definitely support most public registries I think. Should have same features as passing it to docker pull.

so you';d need to decide whether or not to check for an updated image.

Where would we store this information? I was instead thinking we'll pass it through to kubernetes and set imagePullPolicy to Always - which will do the 'right thing' I think.

yuvipanda avatar Jun 30 '21 03:06 yuvipanda

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/embed-binder-related-metadata-in-notebook/10329/1

meeseeksmachine avatar Aug 10 '21 10:08 meeseeksmachine

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/use-published-docker-image-for-binder/10333/3

meeseeksmachine avatar Sep 07 '21 13:09 meeseeksmachine

Just a quick 👍 for this feature. In the Pangeo Forge sandbox we define a custom image for Binder, to provide new users a pre-built environment to experiment with our tools. IIUC, if we could pull pre-built images from Docker Hub, it's reasonable to expect that these environments would load faster (thereby reducing friction for new users). Thanks to everyone working on this feature!

cisaacstern avatar Feb 28 '22 18:02 cisaacstern

This would be super useful to have, any status update on this issue?

betolink avatar Oct 12 '22 21:10 betolink

@betolink nobody has had any time to work on this yet :(

yuvipanda avatar Oct 12 '22 21:10 yuvipanda

This would be really useful for me too

ctr26 avatar Oct 28 '22 08:10 ctr26

A quick way of implementing this would be to ignore BinderHub, and customise KubeSpawner to show a form where a user can enter the required container registry image (and/or a dropdown of exisitng images, perhaps dynamically generated). If you're following the nbgitpuller trick you could even include the contents URL as a form parameter.

manics avatar Oct 30 '22 16:10 manics