zero-to-jupyterhub-k8s icon indicating copy to clipboard operation
zero-to-jupyterhub-k8s copied to clipboard

Docs: updates related to eksctl

Open andybrnr opened this issue 4 years ago • 14 comments

Hi all,

With the help of a colleague in DevOps, I've recently started working to move our DS team's infra from JupyterHub and RStudio Server on a large shared AWS EC2 instance to JupyterHub on k8s (AWS EKS deployment). While the doc has substantially improved from when I looked at it a year ago, running through it over the last week revealed a couple inconsistencies between the Z2JH doc and the current state of both EKS and Helm.

(following https://zero-to-jupyterhub.readthedocs.io/en/latest/amazon/step-zero-aws-eks.html as of 2020-04-27)

EKS: I think the introduction of eksctl as a management tool and the default to managed node groups has substantially changed the structure of the EKS docs since the Z2JH AWS EKS guide was written. Step 8 in the Z2JH doc refers to "step 3 in getting started with eks", but the AWS docs now split between "getting started with eksctl" and "getting started with the AWS management console", so it takes some drilling down to find what's actually being referenced. I think eksctl is the preferred approach here, and it'd be better to simply provide .yml files for the cluster and autoscaler configs. Step 9 of the procedure discusses setting up the ConfigMap and again references the EKS Getting Started guide, but all references seem to have been removed from that portion, though it's treated here and here. I think the introduction of managed node groups perhaps removed the need for that?

Helm: Helm v3 has completely removed Tiller, so most of the "Setting up Helm" doc can be removed. EDIT: The docs is now assuming Helm v3.

Once I get our final setup finalized, I'll be happy to take a pass at updating the doc and submitting a PR - just thought I'd flag this for anyone else who takes a shot at this in the near future and try to save a few moments of confusion.

Cheers, Andy

andybrnr avatar Apr 30 '20 07:04 andybrnr

"Yes please!" to helping keep the documentation up to date. Most people in the Z2JH team use GKE so that is what we have most experience with. for all other cloud providers we rely on community members to help out.

On Helm 2 vs Helm 3: I wouldn't update the docs to remove v2. In the main guide there is (I think) a section on setting up helm with hints as to how to do it for helm 3. We are working towards having a helm chart that works with helm 2 and (with minor tweks) also with helm 3. However being able to use this chart with helm v2 will continue to be a requirement for the foreseeable future (this means no helm 3 only features).

betatim avatar May 03 '20 08:05 betatim

@betatim got it - will add explicit notes for the differences between Helm 2 and Helm 3 along with updates for using eksctl. With Helm 2 => 3, it's not an issue with the chart (chart worked fine), it's that Tiller doesn't exist in Helm 3 at all, so setting up Helm is dramatically simplified. Client side usage syntax has changed a little bit, too.

andybrnr avatar May 04 '20 05:05 andybrnr

Thanks @andybrnr for noting these things. I am trying to set this up right now and running into issues with helm.

helm upgrade --install $RELEASE jupyterhub/jupyterhub \
└ -> --namespace $NAMESPACE \
└ -> --version=0.8.2 \
└ -> --values config.yaml
Release "jhub" does not exist. Installing it now.
Error: create: failed to create: namespaces "jhub" not found

And also I expected to see the hub here

helm search hub jupyter
URL                                               	CHART VERSION   	APP VERSION	DESCRIPTION                                       
https://hub.helm.sh/charts/pangeo/pangeo          	20.01.15-e3086c1	           	An extention of jupyterhub with extra Pangeo re...
https://hub.helm.sh/charts/stable/tensorflow-no...	0.1.3           	1.6.0      	A Helm chart for tensorflow notebook and tensor...
https://hub.helm.sh/charts/gradiant/jupyter       	0.1.2           	6.0.3      	Helm for jupyter single server  with pyspark su...

UPDATE From reading the helm docs I figured out how to do this First I created the namespace

kubectl create namespace $NAMESPACE

Then changed the install command

helm install $RELEASE jupyterhub/jupyterhub --namespace $NAMESPACE --version=0.8.2 --values config.yaml

v-stickykeys avatar May 08 '20 14:05 v-stickykeys

Hi @valmack,

Glad you figured it out! You can also create the namespace within the helm call by including the --create-namespace flag.

One other thing I'll note here that's a "gotcha" we ran into. The EKS docs say "make sure you include at least two availability zones for your cluster", but because of the Persistent Volume Claims being built on EBS, if the user logs in for the first time in us-east-2b, but on a following login is assigned to a node in us-east-2c, the PVC won't be able to mount because ebs volumes only live within their own AZ. Not sure if this is different from how things work in GKE. We hacked around this by specifying multiple AZs in the cluster config but anchoring all the worker node groups to a single AZ, ensuring users PVCs and Nodes are always in the same spot. Not sure if there's a better way to handle this in the future - saw some suggestions that building the PVCs with EFS rather than EBS would solve this, as EFS is available across a region vs a specific AZ. This could be overridden in the spec for the storage_class, I think.

relevant k8s git issues for the PVC issue mentioned: https://github.com/kubernetes/autoscaler/issues/1431 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md (see the "Common Notes and Gotchas" at the bottom)

andybrnr avatar May 08 '20 17:05 andybrnr

Hi @valmack & @andybrnr

I also encountered the same problem with helm when setting this up on google cloud:

helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.8.2 --values config.yaml Release "jhub" does not exist. Installing it now. Error: render error in "jupyterhub/templates/proxy/deployment.yaml": template: jupyterhub/templates/proxy/deployment.yaml:26:32: executing "jupyterhub/templates/proxy/deployment.yaml" at <include (print $.Template.BasePath "/hub/secret.yaml") .>: error calling include: template: jupyterhub/templates/hub/secret.yaml:9:19: executing "jupyterhub/templates/hub/secret.yaml" at <required "Proxy token must be a 32 byte random string generated with openssl rand -hex 32!" .Values.proxy.secretToken>: error calling required: Proxy token must be a 32 byte random string generated with openssl rand -hex 32!

However, now that I am about to try the suggested solution posted here, I instead see this error:

error: Get https://34.82.20.3/api/v1/namespaces/kube-system/pods?labelSelector=app%3Dhelm%2Cname%3Dtiller: error executing access token command "/google/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /google/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=

I think my access token has expired? How do I fix this?

(I looked at the access token in the config file and its expiry is:"2020-05-08T05:25:40Z")

Sorry if this isnt the best place to post this; if you could direct me to the best place I would appreciate.

carat64 avatar May 09 '20 02:05 carat64

UPDATE: ...I solved my expired access token issue with this command:

gcloud container clusters get-credentials MY-CLUSTER --zone MY-ZONE

carat64 avatar May 09 '20 16:05 carat64

Dear @andybrnr thanks for opening this issue! I'm going through the same process and I agree with you: the AWS EKS instructions need a major overhaul 💪 I am now trying to figure out how to install JupyterHub on AWS EKS using the most instructions on the AWS EKS User Guide. If you are willing to share your latest workflow I would be happy to test and it and merge it with mine before you will submit a PR.

Best wishes, -Filippo

filippo82 avatar Jun 14 '20 13:06 filippo82

FYI, I am going the eksctl way!

filippo82 avatar Jun 14 '20 13:06 filippo82

@filippo82 Did you get anywhere with the eksctl? I'd be happy to contribute too, but I can imaging you & @andybrnr might have already solved some of the initial challenges.

tracek avatar Jul 17 '20 12:07 tracek

Hi @tracek, I was able to learn a lot regarding eksctl over the past weeks. I was then able to follow a couple of tutorials to set up EKS on Farget. However, I still need to get to the JupyterHub part. I have been quite busy in the past weeks and then I am off to holidays in 2 weeks. I really want to spend more time on these but I don't think I will be able to do so till the mid of September. I'll try to put together my progress on a document and share it here.

Cheers, -F

filippo82 avatar Aug 06 '20 21:08 filippo82

I recently launched a BinderHub, part of which required me to consult the Z2JH docs to set up a cluster with EKS and I found them practically impossible to follow because of updates to the EKS docs (no disrespect to the original author of the Z2JH docs!).

I ended up just following the Amazon EKS guide using eksctl word-for-word and once the cluster was set up, I could simply go to the next step in Z2BH docs (Installing Helm). I'd be happy to contribute to a PR to update the current Z2JH docs - I think much of it could just be offloaded to the Amazon EKS docs.

TomasBeuzen avatar Aug 27 '20 03:08 TomasBeuzen

Hi @TomasBeuzen, did you setup EKS on Fargate? or "standard" EKS?

filippo82 avatar Sep 30 '20 23:09 filippo82

@filippo82 - "standard" EKS :) I'm in a region that does not yet support Fargate + EKS

TomasBeuzen avatar Oct 01 '20 00:10 TomasBeuzen

Hi,

Just wondering how far people had got with updating the docs for this? I'm completely new to K8s, EKS and Jupyter Hub, but managed to get it all working using a combination of the instructions here:

In summary, these are the steps I followed, and I hope they help. I use Mac OSX, so installs here are OS dependent, but I've tried to provide links. As I say, I'm new to this, so there are likely to be mistakes. It also involved a lot of trial and error, so there may be things I installed that are redundant, or mising pre-requisites.

1. Pre-requisites

AWS Access Keys

Create an access key and secret key pair for an IAM profile with ability to create clusters. Now I cheated here a bit, as I was just trying to get something to work, so I just used the admin access key. I know this is not recommended in the slightest, but I just wanted to get something working to start with.

I believe the correct approach, is to create an EKS Service role and then create an IAM user from that, and generate the access keys for that user, however, I haven't tested it. If I get the chance, I will try that and report back here.

Details on managing access keys are here.

Required packages

AWS CLI tool awsebcli

This was to allow me to set up the key pair created in step 1, as I was getting errors using eksctl without it. However the AWS docs seem to suggest it isn't needed. I installed it using homebrew:

brew install awscli

However, this may not be quite the right approach, as the AWS docs have a different set up here.

Once installed, I ran aws configure to set it up to use the access key pair created in step 1.

I think that if this isn't done, you need to specify the IAM role details in the eksctl config file, but this is just guessing.

AWS EKS CLI tool eksctl

This is the CLI for creating the cluster and nodes. Again, installed via homebrew:

brew install weaveworks/tap/eksctl

although other ways (and more detail of the installation) can be found in here.

kubectl

The CLI for interacting with K8s. Again, simple install through

brew install kubectl 

although other install methods are found here.

2. Creating the cluster

This took a while to do, and there were repeated issues for seemingly no real reasons i.e. running it once would fail, and the second time with no changes, it would work. It also takes ages to get running (around 20 mins), so you'll have time to put the kettle on!

I started out trying the steps on the AWS docs, but it just wasn't working for me, so I reverted to the eksctl site directly instead.

Create cluster.yaml

Firstly, I created a cluster.yaml file using the template on the eksctl site as follows:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: Z2JHKubernetesCluster
  region: eu-west-1

nodeGroups:
  - name: ng-1
    instanceType: t2.medium
    volumeSize: 10
    ssh:
      allow: true
      publicKeyName: MyPublicKey

For some background on the choices:

  • metadata.name: the cluster name. Z2JHKubernetesCluster can be any name you choose - this is the name given in the original documentation, so I left it the same here.
  • metadata.region: the AWS region name that the cluster will be deployed in. You can check this list to see if it's available in your region of choice.
  • nodeGroups: based on the documentation on the eksctl website you can have more than 1 of these, but I've kept it simple for now.
  • nodeGroups.instanceType: the EC2 instance type. Initially I used t2.micro as I was just testing out, but this silently fails later on (I'm assuming it doesn't have enough resources to run jupyterhub?). However, using t2.medium appeared to work. You can find all the different options here.
  • nodeGroups.volumeSize: in gigabytes
  • ssh: there are lots of options here which are explained well in the eksctl docs. As I already had a key pair, and this was just development/testing, I went with the option to name the key in place of MyPublicKey. Details about creating and managing key pairs for SSH are here. Make sure to save the key pair .pem though, as it sounds like this can't be changed after it's been set up.

There are also examples and details in the eksctl docs for using existing VPCs and subnets, but as I was happy for it to create me some (again brevity over completeness), I left these out. The details can be found here.

Create the cluster

Next, I ran the command to create the cluster and required node groups:

eksctl create cluster -f cluster.yaml

This is the stage that takes ages, and would occasionally fail for no reason. You'll see that it repeatedly has messages such as

waiting for CloudFormation stack

These will appear again and again, and make it look like it isn't working. It is, it's just slow. You can actually take a look at the progress in the CloudFormation console here. It's also the bit that took the most time to get working, as it would randomly fail for no reason, and the log output wasn't always helpful.

Eventually though, you'll get the message to say it's been created successfully:

EKS cluster "Z2JHKubernetesCluster" in "eu-west-1" region is ready

Verify it's working

To check everything was running as expected, I used:

kubectl get nodes

to show a print out similar to the following:

NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-00-000.eu-west-1.compute.internal   Ready    <none>   1m    v1.19.6-eks-49a6c0
ip-192-168-00-00.eu-west-1.compute.internal    Ready    <none>   1m    v1.19.6-eks-49a6c0

Once it was ready, I then followed the steps to install helm, and jupyter hub as normal.

glsdown avatar Apr 22 '21 19:04 glsdown