mybinder.org-deploy icon indicating copy to clipboard operation
mybinder.org-deploy copied to clipboard

Explore CurveNote AWS credits for mybinder.org

Open choldgraf opened this issue 2 years ago • 13 comments

At JupyterCon I was in touch with @stevejpurves who works with the curvenote organization and the executable books project. I noted that we have an AWS deployment-in-waiting and are mostly waiting for credits to be able to power it. @stevejpurves noted that CurveNote may have O($1,000 / month) credits to provide on AWS specifically.

This is an issue to see if we can connect Binder's AWS deployment with CurveNote's AWS credits. I believe @manics may be the one to connect on this one and get things set up if possible. @minrk may also have thoughts on the best way to set up the cloud infrastructure so that our team has the right combination of permissions / access (e.g. in gke.mybinder.org we have a Binder-wide project).

choldgraf avatar May 16 '23 11:05 choldgraf

If this is something you want to to progress @manics I would be interested in figuring out how to do initial AWS IAM / Organisation set up with you.

stevejpurves avatar May 16 '23 13:05 stevejpurves

@stevejpurves Yes! I'll contact you separately (I've found your LinkedIn).

... best way to set up the cloud infrastructure so that our team has the right combination of permissions / access

@choldgraf Definitely possible in several ways (all with their pros and cons), depending on how the AWS org is setup. Given it's potentially sensitive information I think it's best to discuss in private, then summarise here when we've agreed what we're happy to make public.

manics avatar May 16 '23 14:05 manics

We had a chat today, @stevejpurves has a single shared AWS account, and he's set me up with a limited IAM user with console and CLI/API access to us-east-2. The IAM users/groups/permissions are configured in a private Terraform repo, and assuming this all works more team members can be added to the group.

There'll be some back-and-forth to sort out missing IAM permissions for users, and to setup the roles that BinderHub will need- this will need to be done separately from the mybinder.org-deploy terraform, otherwise an unprivileged IAM user could create a full admin user/role. My plan is to figure this out with a temporary dev deployment, and if it works rip it all down and deploy it properly. To make the budget easier to manage I was thinking a fixed size cluster would be easiest to start with, so the main variable cost would be ECR storage.

manics avatar May 17 '23 14:05 manics

Short update: Using a restricted IAM user I've deployed a lot of the supporting infrastructure for EKS using the EKS Terraform module, but I'm still working though the IAM permissions for the critical step of deploying the actual EKS cluster. I'll update here as soon as I have something.

manics avatar May 31 '23 13:05 manics

I've deployed a EKS cluster, and GitHub OIDC is working! https://github.com/jupyterhub/mybinder.org-deploy/pull/2652 I'll now work on a manual BinderHub deployment including AWS specific K8s infrastructure such as load balancer and storage controllers.

manics avatar Jun 12 '23 21:06 manics

I've got a first version partially working. I can build and run https://github.com/binderhub-ci-repos/minimal-dockerfile but there's a weird problem with some Conda packages- some executable files are installed -rw-r--r-- instead of -rwxr-xr-x and the image fails to run.

manics avatar Jun 23 '23 22:06 manics

@manics just checking in on this! I realise you're a bit further on than your last comment above and was wondering what remaining steps there are in getting the deployment in the federation? (i'm out after end of next week in case you need anything from me)

stevejpurves avatar Jul 11 '23 10:07 stevejpurves

I'm happy with the Terraform deployment, it's waiting for review: https://github.com/jupyterhub/mybinder.org-deploy/pull/2652 I'll open a follow-up PR for the BinderHub deployment, after that it should be fairly straightforward to add it to the federation (in practice there will most likely be bugs as soon as it scales up to take a production load, which we'll have to work through).

I have noticed a couple of minor bugs in the private IAM roles, but they're not blockers for the mybinder.org deployment

manics avatar Jul 11 '23 12:07 manics

@manics @choldgraf I wanted to check in on this issue - as far as I know we have the hub up and running, @manics is happy with the deployment but we've yet to add anything into the federation? What can we do to move this along?

stevejpurves avatar Sep 28 '23 10:09 stevejpurves

The EKS/Terraform PR was merged this week which I meant I could move on to the final deployment (including adding the mybinder.org secret config which I've now got access to, https://github.com/jupyterhub/mybinder.org-deploy/pull/2698).

One thing I discovered is the EKS network controller needs updating for full NetworkPolicy support. I'm planning to look at this, there's a risk of downtime so I'd like to test things on a temporary second EKS cluster.

Once that's done I can finish off https://github.com/jupyterhub/mybinder.org-deploy/pull/2698, and one more PR after that will let us direct production mybinder.org traffic to AWS.

manics avatar Sep 28 '23 12:09 manics

oh wow. Thanks for the huge effort @manics 🙌 I hadn't realised you were still busy at configuration 😅.

stevejpurves avatar Sep 28 '23 20:09 stevejpurves

I hadn't realised you were still busy at configuration

Neither did I until I checked something 🤣

manics avatar Sep 29 '23 08:09 manics

I've added the Curvenote deployment to our CI/CD system, and it seems to be successful. I need to do more work before it's ready to receive mybinder.org traffic, but the CD workflow https://github.com/jupyterhub/mybinder.org-deploy/actions/workflows/cd.yml should hopefully stay green now

manics avatar Nov 02 '23 10:11 manics