zero-to-jupyterhub-k8s icon indicating copy to clipboard operation
zero-to-jupyterhub-k8s copied to clipboard

Update docs for Managed Digital Ocean K8

Open willingc opened this issue 6 years ago • 3 comments
trafficstars

Working with Pamela Wadhwa at Quansight, I have realized that there are some constraints of the Digital Ocean Managed K8 service that prevents JupyterHub from running properly.

It manifests itself as an "insufficient memory" error. It appears that some of DO's management prevents our ability to control the cluster as needed. https://www.digitalocean.com/docs/kubernetes/resources/managed/ In addition, Let's Encrypt is not supported.

We should update some of the docs to reflect the above.

cc/ @dharhas

willingc avatar Sep 07 '19 08:09 willingc

how do you solve the 'insufficient memory' issue?

SPTKL avatar Dec 18 '19 21:12 SPTKL

I don't know yet how to solve the letsencrypt issue, but you can temporarily disable HTTPS for testing. In your config.yaml:

proxy:
  https:
    enabled: false  # Do not use this setting in production unless you know what you're doing!

Later you can use another method (like obtaining certs manually) supported by the Jupyterhub Helm chart. Some info on that here: https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/reference.html#proxy

For the insufficient memory issue, the problem is that the default node size on DO is not large enough to meet the memory guarantee requested in the JupyterHub Helm chart, so they cannot launch. The fix is to bump the node size when you create the cluster: doctl k8s cluster create jupyter-kubernetes --region nyc1 --node-pool="name=worker-pool;count=3;size=s-2vcpu-4gb" where s-2vcpu-4gb is one of the node sizes available from the list shown by doctl k8s options sizes

In my testing this doubled the projected monthly cost ($30 -> $60) of running the cluster, but there may be tweaks to improve this depending on your use case.

richardotis avatar Aug 26 '20 15:08 richardotis

I figured out the issue with letsencrypt too. Despite what the DO Managed Kubernetes documentation says about Let's Encrypt, the implementation used here works just fine. As soon as you helm upgrade your jupyterhub deployment and get the external-ip of your cluster, you have to update the DNS (A record) of your domain to point to that IP. If this is not the first time the record was created, wait whatever length of time the Time-To-Live (TTL) for the record is. Then, to fix autohttps, you can do

kubectl --namespace=jhub get pod
kubectl --namespace=jhub delete pod autohttps-whatever-it-is-from-above

The cluster will restart the Let's Encrypt process. To check the logs, you can run kubectl --namespace=jhub logs -c traefik autohttps-whatever-the-new-name-is. In my deployment I get a message which looks like:

time="2020-08-27T16:20:04Z" level=info msg="Configuration loaded from file: /etc/traefik/traefik.toml"
time="2020-08-27T16:20:04Z" level=info msg="Traefik version 2.1.9 built on 2020-03-23T17:23:17Z"
time="2020-08-27T16:20:04Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/contributing/data-collection/\n"
time="2020-08-27T16:20:04Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
time="2020-08-27T16:20:04Z" level=info msg="Starting provider *file.Provider {\"watch\":true,\"filename\":\"/etc/traefik/dynamic.toml\"}"
time="2020-08-27T16:20:04Z" level=info msg="Starting provider *acme.Provider {\"email\":\"[email protected]\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"/etc/acme/acme.json\",\"keyType\":\"RSA4096\",\"httpChallenge\":{\"entryPoint\":\"http\"},\"ResolverName\":\"le\",\"store\":{},\"ChallengeStore\":{}}"
time="2020-08-27T16:20:04Z" level=info msg="Testing certificate renew..." providerName=le.acme
time="2020-08-27T16:20:04Z" level=info msg="Starting provider *traefik.Provider {}"

Depending on your domain, you may also get a bunch of messages like level=error msg="Error getting challenge for token retrying in 682.667507ms" providerName=le.acme. Your HTTPS should still work though. As far as I can tell, this pops up when you have a non-LE wildcard certificate issued for *.example.org but are trying to use LE for subdomain.example.org. I've ignored these error messages in my deployment so far.

There may be a way to get the cluster to update the DNS records automatically when the external IP is assigned, using something like https://www.digitalocean.com/community/tutorials/how-to-automatically-manage-dns-records-from-digitalocean-kubernetes-using-externaldns but I haven't figured that out yet.

This is a copy of the log of commands I used to bring my deployment up. I am very new to this and none of this should be taken as best practice. I reordered and deleted some items to get rid of things that didn't work:

# Need doctl, helm, kubectl and openssl in PATH before starting
doctl auth init
# API token from DO control panel
doctl k8s options sizes
doctl k8s cluster create jupyter-kubernetes --region nyc1 --node-pool="name=worker-pool;count=3;size=s-2vcpu-4gb"
# Wait about 5 minutes
doctl k8s cluster kubeconfig save jupyter-kubernetes
ssh-keygen -f ssh-key-jupyter-kubernetes
move ssh-key* .ssh
kubectl get node
# Helm/Tiller stuff starts here
kubectl --namespace kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller --history-max 100 --upgrade --wait
# Tiller was now installed in the kubernetes cluster
# If we need to setup Helm on another machine with our existing cluster, then we run
# helm init --client-only
helm list
# result of list should be empty
kubectl create namespace jhub
# Helm charts contains templates that with provided values will render to Kubernetes resources to be installed in a Kubernetes cluster.
# Begin jupyterhub setup
openssl rand -hex 32
# copy and paste into new file config.yaml for secretToken value
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm upgrade --install test1 jupyterhub/jupyterhub --namespace jhub --version=0.9.1 --values config.yaml
kubectl --namespace=jhub get pod
kubectl --namespace=jhub get svc proxy-public
# Now I have the external IP of my jupyterhub!
# Teardown
helm delete --purge test1
kubectl delete namespace jhub
doctl k8s cluster delete jupyter-kubernetes

Here is my lightly redacted config.yaml:

proxy:
  secretToken: "arandomstring"
  https:
    hosts:
      - cloud.example.org
    letsencrypt:
      contactEmail: "[email protected]"
auth:
  type: dummy
singleuser:
  defaultUrl: "/lab"
  image:
    name: dockerhub-username/my-custom-image
    tag: 3h6dd197

richardotis avatar Aug 28 '20 15:08 richardotis