infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Investigate / clear up exceptions in jenkins system log

Open sxa opened this issue 1 year ago • 4 comments

A quick search on the recent ones (today and yesterday) can be done with grep -h '^[^\ ]*Exception' /var/log/jenkins/jenkins.log /var/log/jenkins/jenkins.log.1 | sort | uniq -c

The two largest culprits at the moment are

  • 31 java.io.EOFException: unexpected stream termination Looks to be consistently following SEVERE c.m.a.v.r.AzureVMAgentSSHLauncher#launch: Got exception on agent azurebuildagentxxxxxx
  • 62 java.io.IOException: Agent failed to connect, even though the launcher didn't report it. See the log output for details. Looks consistently following WARNING hudson.slaves.NodeProvisioner#update: Unexpected exception encountered while provisioning agent azurebuildagent so both are with the deprecated Azure plugin
  • 657 java.lang.IllegalStateException: completed or broken execution These seem to be exclusively on a particular job from Feb 20 which we can probably delete WARNING o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[287:build]:Owner[build-scripts/jobs/jdk8u/jdk8u-windows-x64-temurin/493:build-scripts/jobs/jdk8u/jdk8u-windows-x64-temurin #493]
  • 1329 java.lang.reflect.InaccessibleObjectException: Unable to make field private static final long java.nio.channels.ClosedChannelException.serialVersionUID accessible: module java.base does not "opens java.nio.channels" to unnamed module @6be968ce Seems Orka related as it's consistently preceded by INFO o.j.p.cloudstats.CloudStatistics#getIdFor: No support for cloud-stats-plugin by class io.jenkins.plugins.orka.OrkaProvisionedAgent
  • 63 java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms No constant cause on first look but many are No node in READY state is available to deploy to with a mention of orka. The others generally have WARNING h.i.i.InstallUncaughtExceptionHandler#handleException as the preceding line

Based on this clearing up things from the Azure and Orka provisioning would help with bullets 1, 2, 4 with 3 being one specific job that I can delete and the final one is partially with orka (Noting that as of 1142 today 100% of those were orka)

sxa avatar Apr 10 '24 10:04 sxa

I've deleted job 493 from bullet 3 - if anyone wants to look into it, here is a tarball with the job directory from it, plus the two at either side: windows-build-493.tar.gz

sxa avatar Apr 12 '24 12:04 sxa

Azure cloud definition has been removed

sxa avatar Apr 25 '24 11:04 sxa

Adding these which are also occurring:

Caused: java.io.IOException
java.lang.reflect.InaccessibleObjectException: Unable to make field private static final long java.nio.channels.ClosedChannelException.serialVersionUID accessible: module java.base does not "opens java.nio.channels" to unnamed module @6be968ce
Caused: java.lang.RuntimeException: Failed to serialize hudson.slaves.OfflineCause$ChannelTermination#cause for class hudson.slaves.OfflineCause$ChannelTermination
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.Node#temporaryOfflineCause for class hudson.slaves.DumbSlave
javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
Caused: java.io.IOException
Caused: java.lang.reflect.InvocationTargetException

(Final three are fairly common but potentially related to the new Debian container that was done for https://github.com/adoptium/infrastructure/issues/3043 which seems to have been producing some exceptions - the others less common) Input length one had 1354 instances yesterday,

sxa avatar Apr 30 '24 14:04 sxa

Running this again today, we have 18 occurences of lines like this, all on different build jobs: java.io.IOException: Cannot save actions for StepAtomNode[id=622, exec=CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]] for completed execution CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]: [org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl@2e4d12f0, org.jenkinsci.plugins.workflow.actions.TimingAction@4a163c4a, org.jenkinsci.plugins.workflow.support.actions.LogStorajava.io.IOException: Cannot save actions for StepAtomNode[id=622, exec=CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]] for completed execution CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]: [org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl@2e4d12f0, org.jenkinsci.plugins.workflow.actions.TimingAction@4a163c4a, org.jenkinsci.plugins.workflow.support.actions.LogStorageAction@c0a7d6a, org.jenkinsci.plugins.workflow.actions.LabelAction@1640e7d3, io.jenkins.blueocean.listeners.NodeDownstreamBuildAction@7d535a15] geAction@c0a7d6a, org.jenkinsci.plugins.workflow.actions.LabelAction@1640e7d3, io.jenkins.blueocean.listeners.NodeDownstreamBuildAction@7d535a15]

Other than that we have the following that are in double figures from yesterday's log:

  • 21 java.lang.IllegalStateException: completed or broken execution All seem to be following these warnings (Differing jobs): 2024-05-02 19:54:46.843+0000 [id=64] WARNING o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[69:build]:Owner[build-scripts/evaluation-openjdk22-pipeline/26:build-scripts/evaluation-openjdk22-pipeline #26]
  • 26 java.lang.NullPointerException No consistent cause immediately obvious
  • 1803 java.lang.reflect.InaccessibleObjectException: Unable to make field private static final long java.nio.channels.ClosedChannelException.serialVersionUID accessible: module java.base does not "opens java.nio.channels" to unnamed module @6be968ce Likely orka related - covered in https://github.com/adoptium/infrastructure/issues/2536#issuecomment-2047894149
  • 38 java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms 24 of these are from orka
  • 17 org.kohsuke.stapler.framework.adjunct.NoSuchAdjunctException: Neither lib.form.hetero-list.hetero-list.css, .js, .html, nor .jelly were found Possibly access/crumb issue

Significantly fewer than before, and most of the remaining ones seem to be relating to the orka plugin

sxa avatar May 03 '24 11:05 sxa