zero-to-jupyterhub-k8s
zero-to-jupyterhub-k8s copied to clipboard
Docs: updates related to eksctl
Hi all,
With the help of a colleague in DevOps, I've recently started working to move our DS team's infra from JupyterHub and RStudio Server on a large shared AWS EC2 instance to JupyterHub on k8s (AWS EKS deployment). While the doc has substantially improved from when I looked at it a year ago, running through it over the last week revealed a couple inconsistencies between the Z2JH doc and the current state of both EKS and Helm.
(following https://zero-to-jupyterhub.readthedocs.io/en/latest/amazon/step-zero-aws-eks.html as of 2020-04-27)
EKS:
I think the introduction of eksctl
as a management tool and the default to managed node groups has substantially changed the structure of the EKS docs since the Z2JH AWS EKS guide was written. Step 8 in the Z2JH doc refers to "step 3 in getting started with eks", but the AWS docs now split between "getting started with eksctl" and "getting started with the AWS management console", so it takes some drilling down to find what's actually being referenced. I think eksctl
is the preferred approach here, and it'd be better to simply provide .yml files for the cluster and autoscaler configs. Step 9 of the procedure discusses setting up the ConfigMap and again references the EKS Getting Started guide, but all references seem to have been removed from that portion, though it's treated here and here. I think the introduction of managed node groups perhaps removed the need for that?
Helm: Helm v3 has completely removed Tiller, so most of the "Setting up Helm" doc can be removed. EDIT: The docs is now assuming Helm v3.
Once I get our final setup finalized, I'll be happy to take a pass at updating the doc and submitting a PR - just thought I'd flag this for anyone else who takes a shot at this in the near future and try to save a few moments of confusion.
Cheers, Andy
"Yes please!" to helping keep the documentation up to date. Most people in the Z2JH team use GKE so that is what we have most experience with. for all other cloud providers we rely on community members to help out.
On Helm 2 vs Helm 3: I wouldn't update the docs to remove v2. In the main guide there is (I think) a section on setting up helm with hints as to how to do it for helm 3. We are working towards having a helm chart that works with helm 2 and (with minor tweks) also with helm 3. However being able to use this chart with helm v2 will continue to be a requirement for the foreseeable future (this means no helm 3 only features).
@betatim got it - will add explicit notes for the differences between Helm 2 and Helm 3 along with updates for using eksctl. With Helm 2 => 3, it's not an issue with the chart (chart worked fine), it's that Tiller doesn't exist in Helm 3 at all, so setting up Helm is dramatically simplified. Client side usage syntax has changed a little bit, too.
Thanks @andybrnr for noting these things. I am trying to set this up right now and running into issues with helm.
helm upgrade --install $RELEASE jupyterhub/jupyterhub \
└ -> --namespace $NAMESPACE \
└ -> --version=0.8.2 \
└ -> --values config.yaml
Release "jhub" does not exist. Installing it now.
Error: create: failed to create: namespaces "jhub" not found
And also I expected to see the hub here
helm search hub jupyter
URL CHART VERSION APP VERSION DESCRIPTION
https://hub.helm.sh/charts/pangeo/pangeo 20.01.15-e3086c1 An extention of jupyterhub with extra Pangeo re...
https://hub.helm.sh/charts/stable/tensorflow-no... 0.1.3 1.6.0 A Helm chart for tensorflow notebook and tensor...
https://hub.helm.sh/charts/gradiant/jupyter 0.1.2 6.0.3 Helm for jupyter single server with pyspark su...
UPDATE From reading the helm docs I figured out how to do this First I created the namespace
kubectl create namespace $NAMESPACE
Then changed the install command
helm install $RELEASE jupyterhub/jupyterhub --namespace $NAMESPACE --version=0.8.2 --values config.yaml
Hi @valmack,
Glad you figured it out! You can also create the namespace within the helm call by including the --create-namespace
flag.
One other thing I'll note here that's a "gotcha" we ran into. The EKS docs say "make sure you include at least two availability zones for your cluster", but because of the Persistent Volume Claims being built on EBS, if the user logs in for the first time in us-east-2b, but on a following login is assigned to a node in us-east-2c, the PVC won't be able to mount because ebs volumes only live within their own AZ. Not sure if this is different from how things work in GKE. We hacked around this by specifying multiple AZs in the cluster config but anchoring all the worker node groups to a single AZ, ensuring users PVCs and Nodes are always in the same spot. Not sure if there's a better way to handle this in the future - saw some suggestions that building the PVCs with EFS rather than EBS would solve this, as EFS is available across a region vs a specific AZ. This could be overridden in the spec for the storage_class, I think.
relevant k8s git issues for the PVC issue mentioned: https://github.com/kubernetes/autoscaler/issues/1431 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md (see the "Common Notes and Gotchas" at the bottom)
Hi @valmack & @andybrnr
I also encountered the same problem with helm when setting this up on google cloud:
helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.8.2 --values config.yaml
Release "jhub" does not exist. Installing it now.
Error: render error in "jupyterhub/templates/proxy/deployment.yaml": template: jupyterhub/templates/proxy/deployment.yaml:26:32: executing "jupyterhub/templates/proxy/deployment.yaml" at <include (print $.Template.BasePath "/hub/secret.yaml") .>: error calling include: template: jupyterhub/templates/hub/secret.yaml:9:19: executing "jupyterhub/templates/hub/secret.yaml" at <required "Proxy token must be a 32 byte random string generated with openssl rand -hex 32
!" .Values.proxy.secretToken>: error calling required: Proxy token must be a 32 byte random string generated with openssl rand -hex 32
!
However, now that I am about to try the suggested solution posted here, I instead see this error:
error: Get https://34.82.20.3/api/v1/namespaces/kube-system/pods?labelSelector=app%3Dhelm%2Cname%3Dtiller: error executing access token command "/google/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /google/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=
I think my access token has expired? How do I fix this?
(I looked at the access token in the config file and its expiry is:"2020-05-08T05:25:40Z")
Sorry if this isnt the best place to post this; if you could direct me to the best place I would appreciate.
UPDATE: ...I solved my expired access token issue with this command:
gcloud container clusters get-credentials MY-CLUSTER --zone MY-ZONE
Dear @andybrnr thanks for opening this issue! I'm going through the same process and I agree with you: the AWS EKS instructions need a major overhaul 💪 I am now trying to figure out how to install JupyterHub on AWS EKS using the most instructions on the AWS EKS User Guide. If you are willing to share your latest workflow I would be happy to test and it and merge it with mine before you will submit a PR.
Best wishes, -Filippo
FYI, I am going the eksctl
way!
@filippo82 Did you get anywhere with the eksctl
? I'd be happy to contribute too, but I can imaging you & @andybrnr might have already solved some of the initial challenges.
Hi @tracek, I was able to learn a lot regarding eksctl
over the past weeks. I was then able to follow a couple of tutorials to set up EKS on Farget. However, I still need to get to the JupyterHub part. I have been quite busy in the past weeks and then I am off to holidays in 2 weeks. I really want to spend more time on these but I don't think I will be able to do so till the mid of September. I'll try to put together my progress on a document and share it here.
Cheers, -F
I recently launched a BinderHub, part of which required me to consult the Z2JH docs to set up a cluster with EKS and I found them practically impossible to follow because of updates to the EKS docs (no disrespect to the original author of the Z2JH docs!).
I ended up just following the Amazon EKS guide using eksctl
word-for-word and once the cluster was set up, I could simply go to the next step in Z2BH docs (Installing Helm). I'd be happy to contribute to a PR to update the current Z2JH docs - I think much of it could just be offloaded to the Amazon EKS docs.
Hi @TomasBeuzen, did you setup EKS on Fargate? or "standard" EKS?
@filippo82 - "standard" EKS :) I'm in a region that does not yet support Fargate + EKS
Hi,
Just wondering how far people had got with updating the docs for this? I'm completely new to K8s, EKS and Jupyter Hub, but managed to get it all working using a combination of the instructions here:
In summary, these are the steps I followed, and I hope they help. I use Mac OSX, so installs here are OS dependent, but I've tried to provide links. As I say, I'm new to this, so there are likely to be mistakes. It also involved a lot of trial and error, so there may be things I installed that are redundant, or mising pre-requisites.
1. Pre-requisites
AWS Access Keys
Create an access key and secret key pair for an IAM profile with ability to create clusters. Now I cheated here a bit, as I was just trying to get something to work, so I just used the admin access key. I know this is not recommended in the slightest, but I just wanted to get something working to start with.
I believe the correct approach, is to create an EKS Service role and then create an IAM user from that, and generate the access keys for that user, however, I haven't tested it. If I get the chance, I will try that and report back here.
Details on managing access keys are here.
Required packages
AWS CLI tool awsebcli
This was to allow me to set up the key pair created in step 1, as I was getting errors using eksctl without it. However the AWS docs seem to suggest it isn't needed. I installed it using homebrew:
brew install awscli
However, this may not be quite the right approach, as the AWS docs have a different set up here.
Once installed, I ran aws configure
to set it up to use the access key pair created in step 1.
I think that if this isn't done, you need to specify the IAM role details in the eksctl config file, but this is just guessing.
AWS EKS CLI tool eksctl
This is the CLI for creating the cluster and nodes. Again, installed via homebrew:
brew install weaveworks/tap/eksctl
although other ways (and more detail of the installation) can be found in here.
kubectl
The CLI for interacting with K8s. Again, simple install through
brew install kubectl
although other install methods are found here.
2. Creating the cluster
This took a while to do, and there were repeated issues for seemingly no real reasons i.e. running it once would fail, and the second time with no changes, it would work. It also takes ages to get running (around 20 mins), so you'll have time to put the kettle on!
I started out trying the steps on the AWS docs, but it just wasn't working for me, so I reverted to the eksctl site directly instead.
Create cluster.yaml
Firstly, I created a cluster.yaml
file using the template on the eksctl site as follows:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: Z2JHKubernetesCluster
region: eu-west-1
nodeGroups:
- name: ng-1
instanceType: t2.medium
volumeSize: 10
ssh:
allow: true
publicKeyName: MyPublicKey
For some background on the choices:
-
metadata.name
: the cluster name.Z2JHKubernetesCluster
can be any name you choose - this is the name given in the original documentation, so I left it the same here. -
metadata.region
: the AWS region name that the cluster will be deployed in. You can check this list to see if it's available in your region of choice. -
nodeGroups
: based on the documentation on the eksctl website you can have more than 1 of these, but I've kept it simple for now. -
nodeGroups.instanceType
: the EC2 instance type. Initially I usedt2.micro
as I was just testing out, but this silently fails later on (I'm assuming it doesn't have enough resources to run jupyterhub?). However, usingt2.medium
appeared to work. You can find all the different options here. -
nodeGroups.volumeSize
: in gigabytes -
ssh
: there are lots of options here which are explained well in the eksctl docs. As I already had a key pair, and this was just development/testing, I went with the option to name the key in place ofMyPublicKey
. Details about creating and managing key pairs for SSH are here. Make sure to save the key pair.pem
though, as it sounds like this can't be changed after it's been set up.
There are also examples and details in the eksctl docs for using existing VPCs and subnets, but as I was happy for it to create me some (again brevity over completeness), I left these out. The details can be found here.
Create the cluster
Next, I ran the command to create the cluster and required node groups:
eksctl create cluster -f cluster.yaml
This is the stage that takes ages, and would occasionally fail for no reason. You'll see that it repeatedly has messages such as
waiting for CloudFormation stack
These will appear again and again, and make it look like it isn't working. It is, it's just slow. You can actually take a look at the progress in the CloudFormation console here. It's also the bit that took the most time to get working, as it would randomly fail for no reason, and the log output wasn't always helpful.
Eventually though, you'll get the message to say it's been created successfully:
EKS cluster "Z2JHKubernetesCluster" in "eu-west-1" region is ready
Verify it's working
To check everything was running as expected, I used:
kubectl get nodes
to show a print out similar to the following:
NAME STATUS ROLES AGE VERSION
ip-192-168-00-000.eu-west-1.compute.internal Ready <none> 1m v1.19.6-eks-49a6c0
ip-192-168-00-00.eu-west-1.compute.internal Ready <none> 1m v1.19.6-eks-49a6c0
Once it was ready, I then followed the steps to install helm, and jupyter hub as normal.