fedora-coreos-pipeline icon indicating copy to clipboard operation
fedora-coreos-pipeline copied to clipboard

jobs/build: Add a release lock to the job

Open dustymabe opened this issue 3 years ago • 3 comments
trafficstars

We're hitting a subtle issue here where newer x86_64 pipelines are running (and early uploading their builds.json) before the release job runs for the previous build. That release job then fails.

This commit adds a new lock to try to prevent newer x86_64 (main) pipeline jobs from running before the fleet of jobs from the previous run are complete.

dustymabe avatar Jul 29 '22 13:07 dustymabe

I think this should fix issues like seen in https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/release/detail/release/465/pipeline

Before this change:

job starts:
 - takes build lock

job 2 starts:
 - wait on build lock

job approaches end:
  - start release job

release job starts:
  - takes release lock (waits on multi-arch builds)

job ends
 - release build lock

job 2 starts:
  - takes build lock

release job still waiting on multi-arch builds:

Now:

job starts:
 - takes build lock
 - takes release lock

job 2 starts:
 - wait on build lock

job approaches end:
  - start release job

release job starts:
  - wait on release lock 

job ends
 - release build lock
 - release release lock

release job continues:
  - takes release lock

job 2 continues and then waits again:
  - takes build lock
  - wait on release lock

release job finishes:
  - release release lock

job 2 continues:
  - take release lock

dustymabe avatar Jul 29 '22 13:07 dustymabe

Hmm, I think the problem here is that cosa push-container doesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.

I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.

jlebon avatar Jul 29 '22 14:07 jlebon

Hmm, I think the problem here is that cosa push-container doesn't support taking the build ID and always tries to use the latest build. Independently of what we do here, we definitely should fix that.

Right. I'm planning to fix that because I'm going to re-work that entire file soon.

I think this PR could still make sense, but it needs more thought. I think our release bits should be completely impervious to the build being released not being the latest, so we shouldn't need this at a technical level. But I can imagine wanting it as a semantic anyway for non-production streams to regulate how quickly we build and release changes.

Yes, I was worried less about the push-container specific part, and more about: "what does it mean to run a release out of order like this. I guess it can't happen that a later release runs before a previous release because of existing locking, which was what I was partially worried about.

So.. Do you want to just go with the surgical fix for push-container for now (I'll fix soon) and leave this for if we find other issues later? I'm cool with that.

dustymabe avatar Jul 29 '22 14:07 dustymabe