build icon indicating copy to clipboard operation
build copied to clipboard

incident: iojs+release is spinning up new jobs in a loop

Open sam-github opened this issue 4 years ago • 13 comments

https://ci-release.nodejs.org/job/iojs+release/

About 10 jobs a minute. Possibly triggered by me cancelling a job I just started?

I disabled the job in order to stop the viral reproduction.

I'll try to stop all the jobs, and once that's done, maybe enable it again, and see how it goes.

If anyone has an idea of what went wrong or how to fix it, pls chime in.

sam-github avatar Apr 09 '20 19:04 sam-github

https://ci-release.nodejs.org/job/iojs+release/5882/

Re-enabled, running master CI, :crossed_fingers:

sam-github avatar Apr 09 '20 19:04 sam-github

Back to normal :partying_face:

sam-github avatar Apr 09 '20 21:04 sam-github

very weird, "started by SCM change"??

And now we now have a bunch of jobs that are in limbo because they're waiting for 10.15 https://ci-release.nodejs.org/computer/release-nearform-macos10.15-x64-1/

Disk space is too low. Only 0.029GB left on /Users/iojs/build.

Filesystem      Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk1s5    51Gi   10Gi   39Mi   100%  483463 536625897    0%   /

huh? this is that crazy macos cache thing isn't it? anyone know how to fix it? @AshCripps @richardlau ?

rvagg avatar Apr 10 '20 12:04 rvagg

reopening because I'm repurposing this for related issue

rvagg avatar Apr 10 '20 12:04 rvagg

very weird, "started by SCM change"??

And now we now have a bunch of jobs that are in limbo because they're waiting for 10.15 https://ci-release.nodejs.org/computer/release-nearform-macos10.15-x64-1/

Disk space is too low. Only 0.029GB left on /Users/iojs/build.

Filesystem      Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk1s5    51Gi   10Gi   39Mi   100%  483463 536625897    0%   /

huh? this is that crazy macos cache thing isn't it? anyone know how to fix it? @AshCripps @richardlau ?

https://github.com/nodejs/build/issues/2173#issuecomment-584892051? I'm not really a macOS person -- the first time I tried to recreate what @AshCripps did I didn't realize you needed to be root (and not the default Administrator or iojs accounts) to get look at .fseventsd (if that's the issue this time).

richardlau avatar Apr 10 '20 12:04 richardlau

Ill also log onto the host see if its at fault

AshCripps avatar Apr 10 '20 12:04 AshCripps

Made #2284 to deal with this new thing specifically, sorry

rvagg avatar Apr 10 '20 12:04 rvagg

This happened again today when I accidently started a job with an empty commit sha, it picked a commit from 2015 and then started building every change since then from what me and @richardlau could tell.

I disabled the job and cancelled all the running jobs, will reenable and see if we are fixed

AshCripps avatar Apr 27 '20 11:04 AshCripps

can't we enforce a value for certain fields in jenkins?

targos avatar Apr 27 '20 11:04 targos

Part of the problems looks to be this: image

i.e. having $commit be blank is "all branches will be examined for changes and built". So the question is what changes are being spotted? I picked one of the jobs at random, https://ci-release.nodejs.org/job/iojs+release/5979/console and see this in the log:

12:06:52  > git fetch --tags --progress https://github.com/nodejs/node.git +refs/heads/*:refs/remotes/origin/* # timeout=30
12:06:55 Seen branch in repository origin/1.8.x-changelogs
12:06:55 Seen branch in repository origin/10.x-libuv-1.28.0
12:06:55 Seen branch in repository origin/11.8.0-changelog
12:06:55 Seen branch in repository origin/2020-03-17-hide-duplicated-error-properties
12:06:55 Seen branch in repository origin/2020-03-17-highlight-classes-while-inspecting
12:06:55 Seen branch in repository origin/Bartosz-ApacheBenchForHTTPBenchmarks
12:06:55 Seen branch in repository origin/Bartosz-ChildProcessBenchmarkFix
12:06:55 Seen branch in repository origin/Bartosz-MakeWindowsZipPackage
12:06:55 Seen branch in repository origin/Bartosz-MakeWindowsZipPackage-4.x
12:06:55 Seen branch in repository origin/CVE-2016-5180-v0.10
12:06:55 Seen branch in repository origin/CompilerError
12:06:55 Seen branch in repository origin/GitHub-Actions-CI
12:06:55 Seen branch in repository origin/TLA
12:06:55 Seen branch in repository origin/Test-Python3-on-Travis-CI
12:06:55 Seen branch in repository origin/Travis-test-Py36-or-Py37
12:06:55 Seen branch in repository origin/V8-6.4.388.44
12:06:55 Seen branch in repository origin/abi-compat-note
12:06:55 Seen branch in repository origin/add-alpine-check
12:06:55 Seen branch in repository origin/add-assert-order-rule
12:06:55 Seen branch in repository origin/add-assert-undefined-property
12:06:55 Seen branch in repository origin/add-assertion-error-docs
12:06:55 Seen branch in repository origin/add-circular-reference-anchors
12:06:55 Seen branch in repository origin/add-color-enforcing-docs
...

+many more branches, most of which don't appear to be in the current https://github.com/nodejs/node/branches/all. So are these stale branches in the local repository? Or in the reference repository /home/iojs/.ccache/node.shared.reference)? Or does the git plugin record these somewhere else?

If requesting a build with the default parameters (e.g. no value for commit) is going to be problematic then we should try to avoid that footgun. Either by detecting an empty commit and aborting and/or providing a sensible default value for the parameter.

richardlau avatar Apr 27 '20 15:04 richardlau

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions[bot] avatar Feb 22 '21 00:02 github-actions[bot]

This reoccurred today via https://ci-release.nodejs.org/job/iojs+release/7241/ which left the commit parameter empty and ended up spawning a lot of builds (we were up to build number 8005 (!)).

I killed all the currently queued/running jobs via this Groovy script (pasted into the "Script Console" under "Manage Jenkins" in the Jenkins web UI): https://gist.github.com/sasjo/6c0159d2a438f256b1127d1ef69b522d

Jenkins.instance.queue.items.findAll { !it.task.name.contains("Extenda") }.each { 
  println "Cancel ${it.task.name}"
  Jenkins.instance.queue.cancel(it.task)
}
Jenkins.instance.items.each {
  stopJobs(it)
}
def stopJobs(job) {
  if (job in jenkins.branch.OrganizationFolder) {
    // Git behaves well so no need to traverse it.
    return
  } else if (job in com.cloudbees.hudson.plugins.folder.Folder) {
    job.items.each { stopJobs(it) }
  } else if (job in org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject) {
    job.items.each { stopJobs(it) }
  } else if (job in org.jenkinsci.plugins.workflow.job.WorkflowJob) {
    if (job.isBuilding() || job.isInQueue() || job.isBuildBlocked()) {
      job.builds.findAll { it.inProgress || it.building }.each { build ->
        println "Kill $build"
        build.finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborted from Script Console"));
      }
    }
  }
}

return true

This may have also cancelled today's nightly and v8-canary builds.

richardlau avatar Oct 14 '21 15:10 richardlau

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions[bot] avatar Aug 11 '22 00:08 github-actions[bot]