pangeo-cloud-federation icon indicating copy to clipboard operation
pangeo-cloud-federation copied to clipboard

refactor circleci config to do each hub in parallel

Open rabernat opened this issue 5 years ago • 3 comments

As we add more and more hubs to this repo, I worry about one interfering with the others.

Currently our script is set up like this

  • build
    • dev.pangeo.io
    • ocean.pange.io
    • etc.
  • deploy
    • dev.pangeo.io
    • ocean.pangeo.io
    • etc.

It should be possible to use circleci workflows to put this in parallel, i.e.

  • dev.pangeo.io
    • build
    • deploy
  • ocean.pangeo.io
    • build
    • deploy

Ideally, nothing would happen at all for ocean on either build or deploy if its directory was not touched in the commit.

Perhaps I am overthinking things, but I wanted to share this idea and get feedback. cc @jhamman, @scottyhq, @yuvipanda.

rabernat avatar Jun 24 '19 13:06 rabernat

We already do this for images - we only rebuild when changed. We could do that for deploys too - only deploy if anything in config, secrets or the meta chart directory changed.

yuvipanda avatar Jun 24 '19 21:06 yuvipanda

Sounds good @yuvipanda. I wonder if a bit of logic to check the cloud provider might also be good somewhere in this toolchain? For example when building we install a bunch of dependencies for the different cloud providers but none of the AWS stuff is required for dev or ocean. See this section: https://github.com/pangeo-data/pangeo-cloud-federation/blob/38c313056a984358a3d03da4bb5309492a0d6921/.circleci/config.yml#L110

I'm also thinking at some point different clusters will be running different versions of tiller and therefore we might need different helm versions? https://github.com/pangeo-data/pangeo-cloud-federation/blob/38c313056a984358a3d03da4bb5309492a0d6921/.circleci/config.yml#L188

To illustrate, we've only had icesat2.pangeo.io up for about 3 months (deployed on a Kubernetes 1.11 cluster with eksctl). Already EKS clusters now default to 1.13! It's hard to stay on top of this stuff with a single cloud provider :(

scottyhq avatar Jun 24 '19 21:06 scottyhq

@yuvipanda posted on our gitter channel how he has done this on the Berkeley data hubs. This is what he wrote:

I spent some time parallelizing JupyterHub image building / deploy for UC Berkeley so image builds happen in parallel now, and deploy waits for them to finish

image

Although a bit before that I had actually parallelized every deploy as well. It isn't obvious because the lines are crossed, but each deploy only depends on the image that's being used to build it

image

I moved it back because we only have 4 concurrent containers available (on the free CircleCI plan), but it definitely is much faster. I've heard a lot of folks in the PANGEO community want this, so you can steal it from https://github.com/berkeley-dsep-infra/datahub/blob/staging/.circleci/config.yml if needed an important change is the use of circle ci orbs - https://circleci.com/orbs/. Lets you abstract away some details from your config file into an external 'orb' that can be published independently. What I really have in that config.yaml is a hubploy build / deploy orb, so would be great to eventually publish that as part of hubploy - so pangeo can just use it rather than have to copy paste

jhamman avatar Oct 08 '19 21:10 jhamman