(GitHub CI) Productise containerised build and release pipeline
Issue
- Build and release automation largely implemented in custom logic which is product specific (Jenkins / pipelines)
- coreos-overlay / portage-stable needs to be manually tested against Jenkins
Impact
- Reproducible builds of releases outside of Jenkins nigh impossible
- Running vendor tests depends on access to build automation
- Test results can only be (truly) reviewed on Jenkins
- Tests can’t be run by external contributors: friction -> Flatcar is OSS but I need Jenkins access to trigger my tests ?
Ideal future state
- CI product agnostic build logic
- scripts for local stateless builds which allows to re-use SDK and binary packages through all build “stages”, parameter propagation across build steps via simple text files
- Documentation in place on testing PRs: what do we want to test (vendors etc), how to trigger a test?
- We use GitHub Actions and / or bot labels in PR comments to build and test PRs
- test results are made available on the respective PRs
The Jenkins tasks are tracked in https://github.com/flatcar-linux/Flatcar/issues/632 The GitHub Action tasks for build automation are:
- [x] https://github.com/flatcar-linux/Flatcar/issues/844 The GitHub Action tasks left for PR builds are:
- [x] https://github.com/flatcar-linux/Flatcar/issues/789
- [ ] https://github.com/flatcar-linux/Flatcar/issues/790
Hi,
I attempted to create a workflow for building flatcar packages and images. The hope was to create a series of workflows that would migrate what is currently running in Jenkins (not really accessible to contributors), into github workflows.
Unfortunately, the runner uses a Standard_DS2_V2 Azure VM size, which has only 2 CPU cores. The first attempt resulted in failure, due to the fact that after 6 hours, github will cancel the job:
https://github.com/gabriel-samfira/scripts/runs/5976212586?check_suite_focus=true
Just the packages took a little over 5 hours to build on this instance. The workflow for this job is here: https://github.com/gabriel-samfira/scripts/blob/main/.github/workflows/ci.yaml
There are two options to mitigate the lack of CPU power:
- Use the workflow to spin up a larger instance and run the tests via SSH remotely. This can be a VM in Azure for example.
- Use self-hosted runners https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners
I used an ephemeral self-hosted runner on a VM spun up on an Equinix metal c3.medium.x86. The VM had the following specs:
- CPU: 8 cores (model: AMD EPYC 7402P 24-Core Processor)
- Memory: 8 GB
- Disk space: 100 GB
The job finished in 1 hour and 21 minutes:
https://github.com/gabriel-samfira/scripts/runs/6468981960?check_suite_focus=true
The workflow for that job is here: https://github.com/gabriel-samfira/scripts/blob/main/.github/workflows/ci-garm.yaml
I have not tested option 1), but that should work as well. For option 1) we need a public cloud account and a few secrets set in the repository where workflows will run.
Option 2 means managing your own runners, but that can be done automatically, and well worth it if you already have dedicated hardware available. I wrote a small app to manage pools of self hosted runners on bare metal and other clouds.
Which option do you think would be best?
Using your garm tool to set up a provisioner that sets up ephemeral runners on EM (or maybe Azure) is good, thanks for pioneering in this area. We now need to split the tasks up like setting up the provisioner, coming up with a workflow to build, publish the image and build logs, run kola qemu tests and publish the test logs etc.
PR testing is implemented in https://github.com/flatcar-linux/scripts/pull/354 for basic builds, I'll create new issues for including qemu kola test workflows and PR workflows in coreos-overlay/portage-stable that spawn the scripts workflow.
Moving this to "done" because it's largely accomplished. The "PR builds + tests" should go into a separate issue.
Tracked here: https://github.com/flatcar/Flatcar/issues/790 which is part of https://github.com/flatcar/Flatcar/issues/632