fedora-coreos-pipeline
fedora-coreos-pipeline copied to clipboard
jobs/build: Add a release lock to the job
We're hitting a subtle issue here where newer x86_64 pipelines are running (and early uploading their builds.json) before the release job runs for the previous build. That release job then fails.
This commit adds a new lock to try to prevent newer x86_64 (main) pipeline jobs from running before the fleet of jobs from the previous run are complete.
I think this should fix issues like seen in https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/release/detail/release/465/pipeline
Before this change:
job starts:
- takes build lock
job 2 starts:
- wait on build lock
job approaches end:
- start release job
release job starts:
- takes release lock (waits on multi-arch builds)
job ends
- release build lock
job 2 starts:
- takes build lock
release job still waiting on multi-arch builds:
Now:
job starts:
- takes build lock
- takes release lock
job 2 starts:
- wait on build lock
job approaches end:
- start release job
release job starts:
- wait on release lock
job ends
- release build lock
- release release lock
release job continues:
- takes release lock
job 2 continues and then waits again:
- takes build lock
- wait on release lock
release job finishes:
- release release lock
job 2 continues:
- take release lock
Hmm, I think the problem here is that cosa push-container doesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.
I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.
Hmm, I think the problem here is that
cosa push-containerdoesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.
Right. I'm planning to fix that because I'm going to re-work that entire file soon.
I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.
Yes, I was worried less about the push-container specific part, and more about: "what does it mean to run a release out of order like this. I guess it can't happen that a later release runs before a previous release because of existing locking, which was what I was partially worried about.
So.. Do you want to just go with the surgical fix for push-container for now (I'll fix soon) and leave this for if we find other issues later? I'm cool with that.