plumbing icon indicating copy to clipboard operation
plumbing copied to clipboard

Move nightly builds to GHA and reduce frequency

Open afrittoli opened this issue 6 months ago • 7 comments

Nightly builds currently run on Azure, but credits will run out at the end of July, so we need to move them to an alternative location.

Moving them back to GCP would be too costly, so we should:

  • move them to GHA instead
  • reduce frequency to weekly (or less), unless a specific project requires

As part of the move, we could consider publishing the manifests to GitHub and deleting very old nightly build manifests to save on cloud storage too.

afrittoli avatar Jul 01 '25 12:07 afrittoli

@vdemeester @afrittoli - I would like to pick this up. Please help with some pointers to start on this

anithapriyanatarajan avatar Jul 02 '25 12:07 anithapriyanatarajan

Thank you @anithapriyanatarajan.

Nightly builds are implemented as Kubernetes Cronjobs that trigger the build by sending an HTTP payload with the required parameters to an Event Listener. Some more details are available at https://github.com/tektoncd/plumbing/blob/main/tekton/cronjobs/README.md.

In this commit you can see the cronjob definitions that were moved from the dogfooding cluster to Azure.

We'll need to move those builds away from Azure, but we need to do so without spending more money on GCP so we cannot use the dogfooding cluster for those anymore. Using GHA is probably the only alternative. There are a few options:

  1. Write a GHA workflow that spins up Tekton (similar to E2E jobs) and run the existing builds there
  2. Convert release pipelines from Tekton to GHA native format

Option (1) may be easier to achieve, but we'll need to consider collecting the execution history and logs for troubleshooting purposes. The resulting user experience will be degraded but the impact in terms of changes should be limited.

Option (2) may require much more work since the release definitions are spread across all repos, with some common parts in plumbing. The implementation could be changed to have the release workflow defined in each repo, reusing some shared action defined in plumbing. An issue with this option is also that the code for full releases and nightly builds is shared. If we move away from Tekton then we need to rework full releases as well (which we'll have to do eventually).

Finally, another issue to consider is that today our nightly builds and releases are signed by Tekton Chains. If we setup a temporary cluster we (option 1) we would have to install and configure chains there as well. If we go for a GHA native project we'll need to find a new way to sign releases, I don't know if there is a sigstore/GitHub integration available that we could use perhaps.

afrittoli avatar Jul 02 '25 13:07 afrittoli

Hi @anithapriyanatarajan, thanks for your work on this so far. What's the current status? Is the approach that you implemented for Pipelines ready to be replicated to other projects? Or are there other changes needed first?

AlanGreene avatar Aug 26 '25 11:08 AlanGreene

From a timeline POV, Azure credits expire at the end of August, so I will turn off nightly builds there soon.

We have a new environment available on OCI, so we have two options:

  • temporarily run nightly builds on OCI
  • miss some nightly builds until this issue is resolved for all repos

There is also another issue to be addressed: we won't have any budget for GCP after September, so the release file won't be hosted there anymore. A few alternatives:

  • Use GHA Artifacts (they have a 90d retention I think, which should be good enough for nightlies)
  • Use OCI object storage
  • Stop doing nightlies, and do bi-weekly minor releases instead, fully automated

afrittoli avatar Aug 26 '25 11:08 afrittoli

@afrittoli , @AlanGreene - The pipeline workflow is working fine. I shall submit similar PRs for other repos.

Could you help on the below 2 items:

  1. Shall we remove the latest tag on all nightly
  2. Shall we defer the option to sign the published nightly images until we migrate nightlies

anithapriyanatarajan avatar Aug 26 '25 11:08 anithapriyanatarajan

I just took a look at the Dashboard PR you opened and there seems to be a lot of duplication between this and the existing Pipelines workflow. Was the plan not to have a composite action or reusable workflow in the plumbing repo that could be shared across projects, similar to the approach in https://github.com/tektoncd/plumbing/pull/2671?

Duplicating all of the setup increases the maintenance overhead, as future changes would need to be repeated for each project. It also makes it harder to identify the relevant differences. Perhaps there was a reason for this though and i may be missing some context.

AlanGreene avatar Aug 26 '25 17:08 AlanGreene

@AlanGreene - That was the original approach, but I ran into issues when trying to abstract the workflow — for example, differences in pipeline names, release file names, and parameters across repositories. Because of that, I focused on fixing the pipeline first & was settling in for repo level approach.

Your observation is valid, and I will rework #2671. Until then marking https://github.com/tektoncd/dashboard/pull/4399 to be on hold

anithapriyanatarajan avatar Aug 26 '25 18:08 anithapriyanatarajan