elemental icon indicating copy to clipboard operation
elemental copied to clipboard

Consistent CI environment

Open kkaempf opened this issue 1 year ago • 3 comments

Our current CI / workers / runners setup is somewhat 'spread' across internal and AWS machines. We should try to have it all in one place and properly documented.

  • Paul, Itxaka, Julien, and Loic - phrase this issue correctly and add acceptance criterias

kkaempf avatar Aug 31 '22 08:08 kkaempf

  • elemental-toolkit is using arm64 workers (1 in aws, 2 internal suse) for building arm64 packages, rest of the jobs use github workers.
  • elemental-cli is using ONLY github workers
  • elemental-operator is using ONLY github workers
  • elemental is using github workers and internal workers for the end2end test (https://github.com/rancher/elemental/blob/main/.github/workflows/e2e.yaml) and the release job (https://github.com/rancher/elemental/blob/main/.github/workflows/release.yaml) but I think the release job should NOT use the build-host. Its just that the release job is not doing anything currently and it may need to either be dropped or reworked to release something...not sure what.

So Im guessing this is about consolidating everything to use either github workers or cloud workers when needed. This can be done easily for toolkit, not sure about elemental end2end tests as those use VMs to test everything...

Itxaka avatar Aug 31 '22 08:08 Itxaka

  • but I think the release job should NOT use the build-host.

It was to speedup the build process, but can use a GH runner instead of s self-hosted one yes.

ldevulder avatar Sep 01 '22 08:09 ldevulder

It was to speedup the build process, but can use a GH runner instead of s self-hosted one yes.

I think we need first to check what are we gonna release as part of the elemental releases. If its just the OCI artifacts then we can just use a github workers as that would take about 5 minutes.

Itxaka avatar Sep 01 '22 08:09 Itxaka

arm64 workers are available in GCE. I created an instance template called elemental-ci-runner-arm64-v2 which contains the bare minimum to support creating VMs that can run the runner. The template has an script attached to install dependencies and has me @davidcassany and @fgiudici keys also injected on the machine created via the template.

The only thing needed after create an instance from that template is to ssh in and download+run the worker service. Tested 1 instance with those steps (available on github -> runners -> add runner) and it results into a worker that runs the build jobs properly.

From my point of view GCE supports our use case for the arm64 workers should we decide to move in there, which I know @ldevulder was interested in.

Price of the machine would be 108$ per month.

Itxaka avatar Sep 07 '22 15:09 Itxaka

Looks like GKE clusters are also available which could be a good way of deploying workers and save money, as the priceis per pod per hour, which seems to be much cheaper than a full vm.

The problem is as usual, we need to set a TOKEN_ID for the github runner and we either add it manually or create automation in a custom image to auto-get the TOKEN_ID. That requires a github PAT on the cluster config but has the potential to allow us to autoscale on times of a lot of traffic to the workers and scale down when there is none...

Azure containers seems to be work the same with AKE.

This options seems to be more expensive (seems like they are more suited to bringing them up and down on demand, i.e not sustained used) And requires development on our side to set it right for bringing those pods on demand.

Itxaka avatar Sep 07 '22 16:09 Itxaka

Azure Arm instances are also available, so its mostly up to us to decide where to move everything. I have no preference one way or another.

Itxaka avatar Sep 07 '22 16:09 Itxaka

@ldevulder could you comment on your preferred cloud operator in case the end2end tests should need to move on down the line? Same with @juadk for the UI tests.

I need to create a new arm64 runner and would like to know in which operator it needs to go :)

Itxaka avatar Sep 08 '22 08:09 Itxaka

@ldevulder could you comment on your preferred cloud operator in case the end2end tests should need to move on down the line? Same with @juadk for the UI tests.

I prefer GCP over to Azure personally. I saw lot of sporadic issues on Azure compared to GCP.

ldevulder avatar Sep 08 '22 09:09 ldevulder

Same to me, I'm not in love with Azure... I would go with GCP as well.

juadk avatar Sep 08 '22 09:09 juadk

nice, that settles it GCE it is. Thanks folks!

Itxaka avatar Sep 08 '22 09:09 Itxaka

Aws runner has been tear down and GCE runner has been setup. Several jobs have been triggered and all of them passed correctly.

Itxaka avatar Sep 08 '22 13:09 Itxaka

@juadk @ldevulder Im wondering if you folks are gonna deploy the needed VMs for the e2e/UI jobs or am I supposed to do so?

In case you want me to do it, I would need some specs here like OS, vCPU, MEM, Disk space and speed. Cheers!

Itxaka avatar Sep 09 '22 10:09 Itxaka

@juadk @ldevulder Im wondering if you folks are gonna deploy the needed VMs for the e2e/UI jobs or am I supposed to do so?

No, we will take care of this. But as I said to @davidcassany yesterday it's not high priority for me, we still have some E2E tests to (re)add and we have a deadline ;-). I will try to do this maybe in 2 weeks.

ldevulder avatar Sep 09 '22 10:09 ldevulder

ok cool!

Itxaka avatar Sep 09 '22 11:09 Itxaka

FYI I will work on this for E2E tests week 38.

ldevulder avatar Sep 19 '22 18:09 ldevulder

Will be follow in issue https://github.com/rancher/elemental/issues/336.

ldevulder avatar Sep 20 '22 08:09 ldevulder