ci-jenkins-pipelines
ci-jenkins-pipelines copied to clipboard
Pipeline status not accurate when a test job hit a timeout and enters `ABORTED` state
Related: Earlier fix applied to set the pipeline status more accurately - https://github.com/adoptium/ci-jenkins-pipelines/issues/1068
Problem identified after a user in slack reported that 23+36-ea was missing for Alpine/x64 in the release.
-
https://ci.adoptium.net/job/build-scripts/job/openjdk23-pipeline/68/ (showing as
ABORTED) -
The Alpine/x64 Subjob hit a failure state due to running on a broken machine (dockerhost-skytap):
20:01:26 Build [build-scripts » jobs » jdk23 » jdk23-alpine-linux-x64-temurin #15](https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk23/job/jdk23-alpine-linux-x64-temurin/15/) completed: FAILURE -
It looks like the riscv64 pipeline had an
ABORTEDstate:00:05:06 Propagating downstream job result: build-scripts/jobs/jdk23/jdk23-linux-riscv64-temurin, Result: ABORTED CopyArtifactsSuccess: truewhich is presumably what set the overall pipeline status to be aborted.
The BlueOcean view of the pipeline did not pick up on either the failures on Alpine/x64 or riscv64:
Two things:
- The overall pipelines states was
ABORTEDrather thanFAILEDwhich may not give the best impression of the status for the purposes of reporting int he slack channel and elsewhere. - The riscv64 extended.openjdk jobs appear to be hitting a 25 hours timeout so that will need to be addressed. jdk22 is taking 21-23 hours. jdk23+24 are hitting the timeout
Interestingly, it appears like it can not retrieve estimated test duration data to calculate how long targets take and in most recent runs, splits into 1 list:
17:14:04 TEST DURATION
17:14:04 ====================================================================================
17:14:04 Total number of tests searched: 83
17:14:04 Number of test durations found: 0
17:14:04 No test duration data found.
17:14:04 (Default duration assigned, executed tests: 40s; not executed tests: 0s.)
17:14:04 ====================================================================================
17:14:04
17:14:04 Test target is split into 1 lists.
17:14:04 Reducing estimated test running time from 26m40s to 26m40s.
Previous runs, for example this Test_openjdk23_hs_extended.openjdk_riscv64_linux/13 splits into 3 lists when can not find test duration data:
17:01:26 TEST DURATION
17:01:26 ====================================================================================
17:01:26 Total number of tests searched: 93
17:01:26 Number of test durations found: 0
17:01:26 No test duration data found.
17:01:26 (Default duration assigned, executed tests: 40s; not executed tests: 0s.)
17:01:26 ====================================================================================
17:01:26
17:01:26 Test target is split into 3 lists.
17:01:26 Reducing estimated test running time from 30m40s to 10m40s.
17:01:26
Will check the test code to see if that is based on what nodes are idle, versus which ones are online.