zero-to-jupyterhub-k8s
zero-to-jupyterhub-k8s copied to clipboard
A better story for Authenticators not in the Hub image
Proposed change
I think we should have a better story for users who want to use an Authenticator not shipped with the Hub image (or any other pip-installed packages).
I don't know if this should be a technical solution or a documentation one. My initial inclination is that documentation should be enough, but we should test it out and make sure it works well and is sustainable across an updating hub.
Alternative options
Don't! Right now, folks can use very lightweight custom images, like:
FROM jupyterhub/k8s-hub:1.0.0
RUN python3 -m pip install my-authenticator-package
and register in config:
hub:
image:
name: my-hub-image
tag: my-hub-tag
This works pretty well the first time you do it, and isn't so bad for folks sticking to tagged releases. But it's a pretty big pain to manage for anyone tracking chartpress-published dev versions, because the image versions aren't easily discoverable, leading to issues like #1015. And folks have to update the base tag and rebuild the image every time they bump the chart version (this is true no matter what version is installed, but at least it's easy to know when using releases).
Who would use this feature?
- Anyone who wants to use a package (or version) not in the default Hub image
- Maintainers reviewing PRs adding little-used dependencies. If external dependencies have a good solution, it will be easier to decline additions to the default image and point to this custom-package installation than accept every PR adding a package to the default image, which implies support even if it's not tested, etc.
(Optional): Suggest a solution
Possible ideas:
- support a pip-install step that works at runtime. This might be doable via documenting existing lifecycle hooks/volumes to allow a pip install to run at image start. Most of these are going to be super lightweight pip installs, so shouldn't be too bad. Support for a hash-including requirements.txt would be good, though.
- work out/document a workflow that makes it easier to keep a custom image extending the default up-to-date across chart upgrades
For instance, to get the current image, this oneliner works:
helm show values jupyterhub/jupyterhub --version 1.0.0-n015.h95486ae6 | python3 -c 'import sys, json, yaml; json.dump(yaml.safe_load(sys.stdin), sys.stdout)' | jq -r '.hub.image | [.name, .tag] | join(":")'
(why am I using Python to turn yaml into json and then jq to print values instead of just Python to print values? 🤷 )
Or to assume the repo and just get the tag with shyaml:
helm show values jupyterhub/jupyterhub --version 1.0.0-n015.h95486ae6 | shyaml get-value hub.image.tag
so you can do:
ARG HUB_IMAGE
FROM ${HUB_IMAGE}
RUN python3 -m pip install my-package
and build & push on every chart update:
CHART_VERSION=1.0.0-n015.h95486ae6
# get the hub image from the chart
HUB_IMAGE=$(helm show values jupyterhub/jupyterhub --version ${CHART_VERSION} | python3 -c 'import sys, json, yaml; json.dump(yaml.safe_load(sys.stdin), sys.stdout)' | jq -r '.hub.image | [.name, .tag] | join(":")')
MY_HUB_IMAGE="my-hub-image:${CHART_VERSION}"
# build our derivative image
docker build -t $MY_HUB_IMAGE --build-arg HUB_IMAGE .
docker push $MY_HUB_IMAGE
This might be doable via documenting existing lifecycle hooks/volumes to allow a pip install to run at image start.
@minrk I've tried using a k8s lifecycle hook but failed. The lifecycle hook installed gql (GraphQL client library) successfully, but for some reason that isn't apparent to me, it didn't end up being available for use in a function called during spawn. It probably relates to a process was started with JupyterHub before it was installed, and then it was installed by a separate process or similar. So, for this to work, some dynamic import or similar may be needed? I'm not sure...
Support for a hash-including requirements.txt would be good, though.
Can you elaborate on what you mean? I don't understand this.
(why am I using Python to turn yaml into json and then jq to print values instead of just Python to print values? shrug )
Because you don't have yq installed!
# example on getting latest chart version
latest_jupyterhub_charts_version=$(helm show chart --repo https://jupyterhub.github.io/helm-chart/ jupyterhub | yq e '.version' -)
# example on getting latest image tag
latest_jupyterhub_charts_hub_image_tag=$(helm show values --repo https://jupyterhub.github.io/helm-chart/ jupyterhub | yq e '.hub.image.tag' -)
Small improved workaround with a custom image
If you only update hub.image.name to your custom image, like quay.io/consideratio/k8s-hub instead of jupyterhub/k8s-hub and let the tag be unchanged, you won't need to modify your own values but are required to build and publish an image of the correct tag at all time before deploying a new version.
If you have failed to publish an updated image, you deployment will fail early instead of failing in potentially unexpected ways like in #1015.
In a GitHub Workflow, getting the latest image information can look like this.
# install helm and yq
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
sudo snap install yq
# get info about latest image tag
HUB_IMAGE_TAG=$(helm show values --repo https://jupyterhub.github.io/helm-chart/ jupyterhub | yq e '.hub.image.tag' -)
FROM_IMAGE=jupyterhub/k8s-hub:$HUB_IMAGE_TAG
TO_IMAGE=quay.io/consideratio/k8s-hub:$HUB_IMAGE_TAG
# optionally emit that info from a github workflow job step to be used in another job step
echo "::set-output name=from_image::$FROM_IMAGE"
# build and push image if needed
docker build --build-arg FROM_IMAGE=$FROM_IMAGE -t $TO_IMAGE .
# (push credentials assumed to be setup at this point)
docker push $TO_IMAGE
I'd say that postStart lifecycle hook is not an option. The issue here is that status.podIP is not updated until the hook finishes its work (https://github.com/kubernetes/kubernetes/issues/85966). From the "Kubernetes in action" book:
Until the hook completes, the container will stay in the Waiting state with the reason ContainerCreating.
Because of this, the pod’s status will be Pending instead of Running.
CNI plugins rely on this field to apply their network policies. E.g., when I tested it in a cluster with Weave, I observed the following behaviour (approximate description):
- pod is created, and it stays in Pending state;
- postStart hook is triggered;
- traffic for the pod is blocked, because CNI plugin waits for
status.podIPto be updated; - commands in the hook fail, because there's no network connectivity. It makes the main container crash;
- after the crash,
status.podIPis finally updated; - CNI plugins sees the change and adjusts its network policy, now the traffic from the pod is allowed;
- the container starts again, and this time the hook succeeds.
Probably, with some CNIs (like those without network policies implemented) this will not cause any issues, but for the rest - it potentially would (maybe, also depends on specific settings).
(upd): added more details.