google-compute-engine-plugin icon indicating copy to clipboard operation
google-compute-engine-plugin copied to clipboard

Builds are not being retried after preemptible VM failure

Open victorboissiere opened this issue 5 years ago • 4 comments

I've checked the logs on the installation of the slave agent and I see that there is a preemptible listener so that if the instance is being terminated the build should be retried on another slave automatically.

However, a build failed due to a communication issue:

Cannot contact jenkins-slave-base-ozsqst: java.lang.InterruptedException
Could not connect to jenkins-slave-base-ozsqst to send interrupt signal to process

master logs:

Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to jenkins-slave-base-ozsqst
                at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
                at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
                at hudson.remoting.Channel.call(Channel.java:955)
                at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
                at jdk.internal.reflect.GeneratedMethodAccessor680.invoke(Unknown Source)
                at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.base/java.lang.reflect.Method.invoke(Method.java:566)
                at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
                at com.sun.proxy.$Proxy92.execute(Unknown Source)
                at io.jenkins.blueocean.autofavorite.FavoritingScmListener.getChangeSet(FavoritingScmListener.java:159)
                at io.jenkins.blueocean.autofavorite.FavoritingScmListener.onCheckout(FavoritingScmListener.java:84)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:140)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:93)
                at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:80)
                at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
                at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
                at java.base/java.lang.Thread.run(Thread.java:834)

It may be related to the fact that this instance has been terminated without warnings. If that is the case, do you think it would be possible to know that this termination action has not been received either by the slave or the Jenkins master and knowing that, retrying all failed builds due to this termination ?

It does not happen often but enough to break some of our builds from time to time.

Version: 4.0.0 Jenkins version: 2.201

victorboissiere avatar Oct 23 '19 11:10 victorboissiere