aqa-test-tools icon indicating copy to clipboard operation
aqa-test-tools copied to clipboard

allTestsInfo may be incorrect

Open sophia-guo opened this issue 10 months ago • 7 comments

@smlambert noticed that recent release run shows allTestsInfo may be incorrect for some builds.

Example : jdk22 release mac extended.openjdk job, which should have around 16+39+38 tests. But trss only shows 3 ( Pre and Post tests are not taken as tests).

Screenshot 2024-03-27 at 4 27 31 PM

https://trss.adoptium.net/allTestsInfo?buildId=65fb17d643ff67006ef89f92&limit=5&hasChildren=true

It happened to all jobs with rerun. Only tests of rerun will show. Expected behaviour is tests should combine original run and rerun.

Screenshot 2024-03-27 at 4 44 15 PM

This might be related with recent update in TKG https://github.com/adoptium/TKG/issues/510.

sophia-guo avatar Mar 27 '24 20:03 sophia-guo

It happened to all jobs with rerun.

sanity.openjdk looks correct to me https://trss.adoptium.net/allTestsInfo?buildId=65fb17d643ff67006ef89f93&limit=5

image

llxia avatar Mar 27 '24 21:03 llxia

Indeed (so raised this issue to check what is happening in the extended jobs to be different from sanity), one thing is that its 3 child jobs under the extended.openjdk, versus sanity.openjdk whose results would be parsed from single console log.

smlambert avatar Mar 27 '24 21:03 smlambert

Actually seems hasChildren did the trick. It happens hasChildren=true. For extended.openjdk the link is https://trss.adoptium.net/allTestsInfo?buildId=65fb17d643ff67006ef89f92&limit=5&hasChildren=true. Sanity.openjdk hasChildren=false the link is https://trss.adoptium.net/allTestsInfo?buildId=65fb17d643ff67006ef89f93&limit=5&hasChildren=false. Might be worth to check how the flag or variable hasChildren is defined and changed.

sophia-guo avatar Mar 28 '24 02:03 sophia-guo

From TRSS history, the Jenkins test build https://ci.adoptium.net/job/Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0/3/ got created on Sep 22, 2023. The root build was https://ci.adoptium.net/job/build-scripts/job/openjdk22-pipeline/69 (no longer exists).

Then Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0/3 got deleted from Jenkins. When the build triggered again, Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0/3 got re-created on Mar 19, 2024. However, TRSS has the old build history. TRSS thinks it is an update of the old record as the build exists in DB. As a result, Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0/3 is referenced/linked to openjdk22-pipeline/69 in TRSS, not the new build.

https://trss.adoptium.net/api/getData?buildName=Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0&buildNum=3

_id: "650d1a6de1aaa4007424f149",
url: "https://ci.adoptium.net/",
buildName: "Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0",
buildNameStr: "Test_openjdk22_hs_extended.openjdk_x86-64_mac_testList_0",
buildNum: 3,
rootBuildId: "650cd2f1e1aaa400742434fc",
parentId: "650d186de1aaa4007424e836",
type: "Test",
status: "Done",
...
timestamp: 1695342033261,
versions: { }

I think we had this situation before.

llxia avatar Mar 28 '24 02:03 llxia

Aw sucky. I am not sure we can guarantee that jobs won't be deleted underneath Jenkins. I remember answering the question "can we delete these", and saying yes, but I had assumed the deletion would happen in the Jenkins GUI, which would have meant that the job ID count would not have been lost. They must have been removed by logging on to the Jenkins server and deleting the workspaces.

Should we consider using a key that includes the parent IDs (for the TRSS DB index)? sigh...

smlambert avatar Mar 28 '24 02:03 smlambert

remember answering the question "can we delete these", and saying yes, but I had assumed the deletion would happen in the Jenkins GUI

Ah gotcha - having read this this issue is potentially related to the removal of the testList jobs as per https://github.com/adoptium/infrastructure/issues/2774#issuecomment-1954462286 - that would make sense and would have reset the counters to zero as the jobs will have been regenerated on demand.

The issue was not the individual job runs (so removing those would have made no difference in terms of solving the problem) but the testList jobs themselves which needed to be regenerated as they were the cause of the parameter errors in the logs, which is why they were deleted as I wasn't aware of a way to refresh them all (They seemed to be causing warnings regardless of whether they were being invoked from what I could see). While in this case I did do the work directly on the filesystem to avoid a lot of clicking, I believe that deleting the job definition via the UI (which is what would have been necessary to force regen) would have had the same effect in terms of "losing" the last build number, since it would have removed all trace of the job including that information.

sxa avatar Apr 03 '24 13:04 sxa

the testList jobs themselves which needed to be regenerated as they were the cause of the parameter errors in the logs, which is why they were deleted as I wasn't aware of a way to refresh them all (They seemed to be causing warnings regardless of whether they were being invoked from what I could see)

Just to clarify, if a Jenkins test job got regenerated, the previously executed job history will cause warnings? If this is the case, it should be a Jenkins issue. The previously executed job history should be static.

llxia avatar Apr 05 '24 14:04 llxia