kubespawner icon indicating copy to clipboard operation
kubespawner copied to clipboard

Support spawning to different clusters

Open yuvipanda opened this issue 3 years ago • 10 comments

Proposed change

Right now, the kubernetes pod is spawned in the same cluster as the hub pod. It would be great if we can configure it to be spawned in other remote clusters. One hub can then spawn into different cloud regions, which is very helpful when dealing with cloud datasets.

The kubernetes API can easily be accessed remotely, but the hub and proxy pod need to find a way to send traffic to the user pod. We can find ways to tunnel this traffic through without much work. My favorite way is to use kubectl port-forward, also used by my earlier expeirments with accessing dask-kubernetes remotely and now dask-kubernetes itself.

Alternative options

  1. Deploy one hub per cluster users want to spawn into. This is more complicated logistically, and for the user.
  2. Make a Service object for each pod, and expose it to the internet via a LoadBalancer. This can receive traffic from the hub and proxy pod

Who would use this feature?

Anyone interested in accssing compute near datasets stored across multiple cloud providers or regions

(Optional): Suggest a solution

  • [ ] Override get_pod_url to start a kubectl port-forward on a free port, to the pod IP on the remote cluster
  • [ ] Make sure that c.JupyterHub.hub_connect_url is something that the pod can connect to. This could be over https on the public internet, or something else.
  • [ ] Figure out how to specify which kubernetes cluster the API will need to connect to

yuvipanda avatar Jul 12 '21 15:07 yuvipanda

Had a very helpful conversation with @consideRatio about this! Since it might add additional complexity here, I think it'd be useful to start this off outside this repo, as a subclass of KubeSpawner. And then upstream what is needed, and hopefully merge them together eventually. This might necessitate refactoring here - particularly around the singleton reflectors. But all changes made here should be useful standalone.

We kinda do a version of this when we test with minikube, doing networking hacks to let the pods talk to the hub.

yuvipanda avatar Jul 12 '21 18:07 yuvipanda

@yuvipanda Curious how you have progressed on this one? We have a similar need to provide a single integrated experience for our jhub users, but across multiple clusters. Jupyter Enterprise Gateway is interesting, but fundamentally a totally different architecture. They spawn pods per kernel (conda env), and don't allow custom kernels not in the whitelist, because each kernel is a single kernel image.

nreith avatar May 10 '22 21:05 nreith

@nreith I actually ended up building a separate spawner for this, and it works fairly well - https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner.

yuvipanda avatar May 11 '22 11:05 yuvipanda

@yuvipanda I found that. We're testing it out, and will make some merge requests and contributions in the future if we are able :-)

nreith avatar May 13 '22 19:05 nreith

@nreith that would be super awesome!

yuvipanda avatar May 19 '22 08:05 yuvipanda

@yuvipanda , Thanks for your great work, I appreciate it very much!

Currently the KubeSpawner is only able to spawn on it's own namespace(due to reflectors) Is the multicluster related to multi namespace by any means(or only clusters)?

I remember there is a configuration to give full cluster permissions to the hub allowing to create namespaces per user. But this is not the case.

I would like to have a single hub, which can spawn on multiple Kube namespaces(which are not the same as the hub) I have a FB of Kubespawner which changes how reflectors work, and added permission to each namespace I want into the Jupyterhub serviceAccount.

Was curios if in your sub-repo there is a way to implement above scenario, or if my implementation would have any use case for others so I could maybe open a PR and issue about it?.

We did it for multiple reasons:

  1. Single place for all users(instead of having a Jupyterhub per namespace)
  2. Minimal permission to the Jupyterhub, only have permissions on selected namespaces.
  3. Reflectors are only looking on spawned namespaces for events, instead of the entire cluster which is quite big.

Thanks for your time!

TiPPeX2 avatar Jun 01 '22 12:06 TiPPeX2

Hi!

is there any activity on this area? We'd really like to have this in place for our JupyterHub and would be happy to join effort on this if there is something ready.

Thanks

enolfc avatar Apr 17 '23 14:04 enolfc

We wrote a multi cluster kubespawner at my work but ultimately ended up going with a different hub per cluster. Will see if we can share if we get a chance. It's inspired by yuvipanda's other multicluster kubespawner.

On Mon, Apr 17, 2023, 9:54 AM Enol Fernández @.***> wrote:

Hi!

is there any activity on this area? We'd really like to have this in place for our JupyterHub and would be happy to join effort on this if there is something ready.

Thanks

— Reply to this email directly, view it on GitHub https://github.com/jupyterhub/kubespawner/issues/516#issuecomment-1511524192, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPP44OULV73NJWWJFQNHJDXBVKR7ANCNFSM5AHDT4GA . You are receiving this because you were mentioned.Message ID: @.***>

nreith avatar Apr 17 '23 18:04 nreith

@nreith I actually ended up building a separate spawner for this, and it works fairly well - https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner.

I came here looking for exactly this functionality so it's great to see it already exists! :heart:

I think this could be very handy for spawning servers in our different environments.

dhirschfeld avatar Jun 08 '23 23:06 dhirschfeld