kubespawner
kubespawner copied to clipboard
Support spawning to different clusters
Proposed change
Right now, the kubernetes pod is spawned in the same cluster as the hub pod. It would be great if we can configure it to be spawned in other remote clusters. One hub can then spawn into different cloud regions, which is very helpful when dealing with cloud datasets.
The kubernetes API can easily be accessed remotely, but the hub and proxy pod need to find a way to send traffic to the user pod. We can find ways to tunnel this traffic through without much work. My favorite way is to use kubectl port-forward
, also used by my earlier expeirments with accessing dask-kubernetes remotely and now dask-kubernetes itself.
Alternative options
- Deploy one hub per cluster users want to spawn into. This is more complicated logistically, and for the user.
- Make a
Service
object for each pod, and expose it to the internet via aLoadBalancer
. This can receive traffic from the hub and proxy pod
Who would use this feature?
Anyone interested in accssing compute near datasets stored across multiple cloud providers or regions
(Optional): Suggest a solution
- [ ] Override
get_pod_url
to start akubectl port-forward
on a free port, to the pod IP on the remote cluster - [ ] Make sure that
c.JupyterHub.hub_connect_url
is something that the pod can connect to. This could be over https on the public internet, or something else. - [ ] Figure out how to specify which kubernetes cluster the API will need to connect to
Had a very helpful conversation with @consideRatio about this! Since it might add additional complexity here, I think it'd be useful to start this off outside this repo, as a subclass of KubeSpawner. And then upstream what is needed, and hopefully merge them together eventually. This might necessitate refactoring here - particularly around the singleton reflectors. But all changes made here should be useful standalone.
We kinda do a version of this when we test with minikube, doing networking hacks to let the pods talk to the hub.
@yuvipanda Curious how you have progressed on this one? We have a similar need to provide a single integrated experience for our jhub users, but across multiple clusters. Jupyter Enterprise Gateway is interesting, but fundamentally a totally different architecture. They spawn pods per kernel (conda env), and don't allow custom kernels not in the whitelist, because each kernel is a single kernel image.
@nreith I actually ended up building a separate spawner for this, and it works fairly well - https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner.
@yuvipanda I found that. We're testing it out, and will make some merge requests and contributions in the future if we are able :-)
@nreith that would be super awesome!
@yuvipanda , Thanks for your great work, I appreciate it very much!
Currently the KubeSpawner is only able to spawn on it's own namespace(due to reflectors) Is the multicluster related to multi namespace by any means(or only clusters)?
I remember there is a configuration to give full cluster permissions to the hub allowing to create namespaces per user. But this is not the case.
I would like to have a single hub, which can spawn on multiple Kube namespaces(which are not the same as the hub) I have a FB of Kubespawner which changes how reflectors work, and added permission to each namespace I want into the Jupyterhub serviceAccount.
Was curios if in your sub-repo there is a way to implement above scenario, or if my implementation would have any use case for others so I could maybe open a PR and issue about it?.
We did it for multiple reasons:
- Single place for all users(instead of having a Jupyterhub per namespace)
- Minimal permission to the Jupyterhub, only have permissions on selected namespaces.
- Reflectors are only looking on spawned namespaces for events, instead of the entire cluster which is quite big.
Thanks for your time!
Hi!
is there any activity on this area? We'd really like to have this in place for our JupyterHub and would be happy to join effort on this if there is something ready.
Thanks
We wrote a multi cluster kubespawner at my work but ultimately ended up going with a different hub per cluster. Will see if we can share if we get a chance. It's inspired by yuvipanda's other multicluster kubespawner.
On Mon, Apr 17, 2023, 9:54 AM Enol Fernández @.***> wrote:
Hi!
is there any activity on this area? We'd really like to have this in place for our JupyterHub and would be happy to join effort on this if there is something ready.
Thanks
— Reply to this email directly, view it on GitHub https://github.com/jupyterhub/kubespawner/issues/516#issuecomment-1511524192, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPP44OULV73NJWWJFQNHJDXBVKR7ANCNFSM5AHDT4GA . You are receiving this because you were mentioned.Message ID: @.***>
@nreith I actually ended up building a separate spawner for this, and it works fairly well - https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner.
I came here looking for exactly this functionality so it's great to see it already exists! :heart:
I think this could be very handy for spawning servers in our different environments.