team-compass icon indicating copy to clipboard operation
team-compass copied to clipboard

Supporting Development of JupyterHub for Scalability

Open whatnick opened this issue 5 years ago • 7 comments

My organization ( Geoscience Australia ) would be interested in funding jupyterhub / kubespawner activity to enhance

  • HA using PostgreSQL DB backend + multiple hub pods
  • User package persistence and lightweight user pod images using support for overlayfs in Kubespawner.

What would be the avenue to achieve this. We run a couple of 100's of users Jupyterhub deployments on EKS.

Our userpod image is here and deployed on.

DigitalEarth Australia DigitalEarth Africa

whatnick avatar Sep 18 '20 09:09 whatnick

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Sep 18 '20 09:09 welcome[bot]

Hey there 👋 ! This sounds great!

For HA related stuff I'd talk with @minrk directly. There are a few issues and forum posts(?) on the jupyterhub repository about ideas and things to keep in mind. I think the short summary is "this is a lot trickier than most people think with a lot of trade-offs to be decided".

For the second bullet point: could you explain a bit more what you have in mind?

betatim avatar Sep 18 '20 13:09 betatim

This conversation is moving over from gitter because @whatnick brought up the possibility of being able to fund this work so I figured we should have people here who can talk about receiving funding. For instance, Simula, where I work, can effectively contract-out my time via Simula Consulting, or pay me (as I currently am paid) via grants at Simula Lab. The technical discussion of HA is open at https://github.com/jupyterhub/jupyterhub/issues/1932. It's a big project, but one I've been thinking about a lot and would love to be able to focus on.

minrk avatar Sep 18 '20 14:09 minrk

This might also be something we can help with via 2i2c if that is an attractive option for the community. I know that @yuvipanda has been interested in HA for a while as well.

choldgraf avatar Sep 18 '20 14:09 choldgraf

@betatim re the second point we have difficult time performing userpod environment management and have an idea to use overlayfs capabilites / bidirectional sync in Kubernetes to allocate a section of the user PVC to store user-specific site packages, also include a safe-mode profile while the overlay is disabled if the user manages to pollute their environment. This way we can get persistence of user installed packages and keep userpod docker images lightweight. However the current Kubespawner code does not support special modes for volume mounts to allow this to work.

whatnick avatar Sep 21 '20 01:09 whatnick

Oh that sounds interesting @whatnick!

Is it correct that there is some part of the k8s specification within volumes or volumeMounts of a pod specification that would need to be supported by KubeSpawner, that currently isn't supported?

consideRatio avatar Sep 21 '20 08:09 consideRatio

We are looking to implement this pattern : https://itnext.io/using-overlay-mounts-with-kubernetes-960375c05959 in jupyterhub userpods.

whatnick avatar Sep 23 '20 06:09 whatnick