TriBITS
TriBITS copied to clipboard
cdash_analyze_and_report.py: Make robust to duplicate tests
Currently, the tool cdash_analyze_and_report.py will error our if there are duplicate test results. In theory this should never happen. But in practice, this can happen because of problems with the way the automated builds and tests are driven. For example consider the ATDM Trilinos build Trilinos-atdm-tlcc2-intel-opt-openmp shown here which shows:
| Site | Build Name | Update | Conf Err | Conf Warn | Build Err | Build Warn | Test Not Run | Test Fail | Test Pass | Start Time | Labels |
|---|---|---|---|---|---|---|---|---|---|---|---|
| skybridge | Trilinos-atdm-tlcc2-intel-opt-openmp | b52c49 | 0 | 20 | 0 | 50 | 3424 | 18 | 2552 | 9 hours ago | (24 labels) |
| skybridge | Trilinos-atdm-tlcc2-intel-opt-openmp | ab779d | 0 | 20 | 0 | 30 | Feb 19, 2020 - 04:03 MST | (24 labels) | |||
| skybridge | Trilinos-atdm-tlcc2-intel-opt-openmp | 2888f0 | 0 | 20 | 0 | 0 | Feb 18, 2020 - 04:03 MST | (24 labels) | |||
| skybridge | Trilinos-atdm-tlcc2-intel-opt-openmp | 1b40ec | 0 | 20 | 0 | 0 | 0 | 0 | 1956 | Feb 17, 2020 - 04:03 MST | (24 labels) |
| skybridge | Trilinos-atdm-tlcc2-intel-opt-openmp | 5e7cdf | 0 | 20 | 0 | 50 | 0 | 0 | 1956 | Feb 16, 2020 - 04:03 MST | (24 labels) |
What happened here is that the SLURM job to run the tests on the testing days 2020-02-18 and 2020-02-19 got held up and did not start running until the testing day 2020-02-20 and then these three ctest -S jobs just ran on top of each other. Now that is not supposed to happen but Jenkins is not killing those testing jobs. But in any case, running:
cdash_analyze_and_report.py \
--date='2020-02-20' \
--cdash-project-testing-day-start-time='04:01' \
--cdash-project-name='Trilinos' \
--build-set-name='Promoted ATDM Trilinos Builds' \
--cdash-site-url='http://testing.sandia.gov/cdash' \
--cdash-builds-filters='filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-' \
--cdash-nonpassed-tests-filters='filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable' \
--expected-builds-file='/home/rabartl/Trilinos.base/TrilinosATDMStatus/promotedAtdmTrilinosExpectedBuilds.csv' \
--tests-with-issue-trackers-file='/home/rabartl/Trilinos.base/TrilinosATDMStatus/promotedAtdmTrilinosTestsWithIssueTrackers.csv' \
--cdash-queries-cache-dir='/home/rabartl/Trilinos.base/ATDMStatusEmails' \
--cdash-base-cache-files-prefix='promotedAtdmTrilinosBuilds_' \
--use-cached-cdash-data='off' \
--limit-test-history-days='30' \
--limit-table-rows='20' \
--require-test-history-match-nonpassing-tests='off' \
--print-details='off' \
--write-failing-tests-without-issue-trackers-to-file='promotedAtdmTrilinosTwoif.csv' \
--write-email-to-file='promotedAtdmTrilinosBuilds.html' \
--email-from-address='' \
--send-email-to='' \
produces the error:
***
*** Query and analyze CDash results for Promoted ATDM Trilinos Builds for testing day 2020-02-20
***
Num expected builds = 33
Num tests with issue trackers = 93
CDash builds browser URL:
http://testing.sandia.gov/cdash/index.php?project=Trilinos&date=2020-02-20&filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-
Downloading CDash data from:
http://testing.sandia.gov/cdash/api/v1/index.php?project=Trilinos&date=2020-02-20&filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-
Caching data downloaded from CDash to file:
/home/rabartl/Trilinos.base/ATDMStatusEmails/promotedAtdmTrilinosBuilds_fullCDashIndexBuilds.json
Num builds = 30
Getting list of nonpassing tests from CDash ...
CDash nonpassing tests browser URL:
http://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2020-02-20&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable
Downloading CDash data from:
http://testing.sandia.gov/cdash/api/v1/queryTests.php?project=Trilinos&date=2020-02-20&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable
Caching data downloaded from CDash to file:
/home/rabartl/Trilinos.base/ATDMStatusEmails/promotedAtdmTrilinosBuilds_fullCDashNonpassingTests.json
Num nonpassing tests direct from CDash query = 3463
Traceback (most recent call last):
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 613, in <module>
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 994, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 929, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 795, in createLookupDictForListOfDicts
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[21] =
{u'buildName': u'Trilinos-atdm-tlcc2-intel-opt-openmp', u'buildSummaryLink': u'buildSummary.php?buildid=6440363', u'buildstarttime': u'2020-02-20T04:07:38 MST', u'details': u'Completed (Failed)\n', u'matchingoutput': u'==============================================\n\nOVERALL FINAL RESULT: TEST FAILED (SEACASAprepro_aprepro_array_test)\n\nXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n\n', u'nprocs': 1, u'prettyProcTime': u' 170ms', u'prettyTime': u' 170ms', u'procTime': 0.17, u'site': u'skybridge', u'siteLink': u'viewSite.php?siteid=317', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=102685529&build=6440363', u'testname': u'SEACASAprepro_aprepro_array_test', u'time': 0.17}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[20] =
{u'buildName': u'Trilinos-atdm-tlcc2-intel-opt-openmp', u'buildSummaryLink': u'buildSummary.php?buildid=6440363', u'buildstarttime': u'2020-02-20T04:07:38 MST', u'details': u'Completed (Failed)\n', u'matchingoutput': u'==============================================\n\nOVERALL FINAL RESULT: TEST FAILED (SEACASAprepro_aprepro_array_test)\n\nXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n\n', u'nprocs': 1, u'prettyProcTime': u' 320ms', u'prettyTime': u' 320ms', u'procTime': 0.32, u'site': u'skybridge', u'siteLink': u'viewSite.php?siteid=317', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=102685529&build=6440363', u'testname': u'SEACASAprepro_aprepro_array_test', u'time': 0.32}
and differs by at least the key/value pair
listOfDicts[21]['prettyTime'] = ' 170ms' != listOfDicts[20]['prettyTime'] = ' 320ms'
Error, could not compute the analysis due to above error so return failed!
Writing HTML file 'promotedAtdmTrilinosBuilds.html' ...
FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-02-20
It would be nice if this script would either automatically (or by passing in an option), just discard duplicate tests. I think it should likely discard duplicate tests according to the logic:
- First: Take a passing test
- Second: Take a failing test
- Third: Take a not-run test
- Otherwise: If the the test status is the same, take the test with the smaller runtime.
The question is how to apply this? Here are the options:
- Silently filter out duplicate tests using the above logic
- Filter out duplicate tests but warn the user there were duplicate tests detected
- Require an option like
--require-unique-tests=offto allow duplicate tests to be removed
I am thinking that it may be best to have an option --require-unique-tests but to make it off by default and allow a user to enable it if they want to validate unique tests (and therefore validate their data on CDash).
This just happened again yesterday in the build Trilinos-atdm-tlcc2-intel-opt-openmp on skybridge yesterday and it crashed the script. Therefore, I am putting this in progress and I will get this done.
On second thought, the option --require-unique-tests should be on by default and we should create a good error that the user should consider setting --require-unique-tests=off in order to get the script to process the data and return something.
Had another one of these failures yesterday due to doubling up on test results. The error was:
Traceback (most recent call last):
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in <module>
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
lookedUpDict, dictDiffErrorMsg)
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[2] =
{u'buildName': u'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', u'buildSummaryLink': u'buildSummary.php?buildid=5311174', u'buildstarttime': u'2020-03-19T03:07:51 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'tex50:92004] -----------------------\n[vortex50:92004] -----------------------\n[vortex50:92004] *** End of error message ***\nERROR: One or more process (first noticed rank 1) terminated with signal 6\n', u'nprocs': 4, u'prettyProcTime': u'7m 28s 680ms', u'prettyTime': u'1m 52s 170ms', u'procTime': 448.68, u'site': u'vortex', u'siteLink': u'viewSite.php?siteid=341', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86269094&build=5311174', u'testname': u'MueLu_Maxwell3D-Tpetra_MPI_4', u'time': 112.17}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[1] =
{u'buildName': u'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', u'buildSummaryLink': u'buildSummary.php?buildid=5311174', u'buildstarttime': u'2020-03-19T03:07:51 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'tex50:61649] -----------------------\n[vortex50:61649] -----------------------\n[vortex50:61649] *** End of error message ***\nERROR: One or more process (first noticed rank 0) terminated with signal 6\n', u'nprocs': 4, u'prettyProcTime': u'6m 10s 160ms', u'prettyTime': u'1m 32s 540ms', u'procTime': 370.16, u'site': u'vortex', u'siteLink': u'viewSite.php?siteid=341', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86208296&build=5311174', u'testname': u'MueLu_Maxwell3D-Tpetra_MPI_4', u'time': 92.54}
and differs by at least the key/value pair
listOfDicts[2]['prettyTime'] = '1m 52s 170ms' != listOfDicts[1]['prettyTime'] = '1m 32s 540ms'
Error, could not compute the analysis due to above error so return failed!
This seems to be happening in the recent past.
For now, I am going to just delete that build off CDAsh.
NOTE: One thing about these tests that is true is that they all of the same "Build Time" so you can't filter tests based on that. And since the tests are all running on the same executables (perhaps while they are still being build), we really just want to prefer to take the passing tests first I think.
This happened again today. The running of test on 'eclipse' was held up for 3 days until they all ran at once toward the end of the day. This caused the cdash_analyze_and_report..py tool to generate the following error message:
Traceback (most recent call last):
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
lookedUpDict, dictDiffErrorMsg)
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[43] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt', u'buildSummaryLink': u'buildSummary.php?buildid=5320902', u'buildstarttime': u'2020-03-23T03:07:14 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'/Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt/SRC_AND_BUILD/BUILD/packages/rol/example/tempus/ROL_example_tempus_example_02.exe[0x4100e9]\n[ec392:41133] *** End of error message ***\n', u'nprocs': 1, u'prettyProcTime': u'1s 290ms', u'prettyTime': u'1s 290ms', u'procTime': 1.29, u'site': u'eclipse', u'siteLink': u'viewSite.php?siteid=336', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86495027&build=5320902', u'testname': u'ROL_example_tempus_example_02_MPI_1', u'time': 1.29}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[36] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt', u'buildSummaryLink': u'buildSummary.php?buildid=5320902', u'buildstarttime': u'2020-03-23T03:07:14 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'nmp_static_opt/SRC_AND_BUILD/BUILD/packages/rol/example/tempus/ROL_example_tempus_example_02.exe[0x4100e9]\n[ec1476:118026] *** End of error message ***\nsrun: error: ec1476: task 0: Segmentation fault\n', u'nprocs': 1, u'prettyProcTime': u'1s 930ms', u'prettyTime': u'1s 930ms', u'procTime': 1.93, u'site': u'eclipse', u'siteLink': u'viewSite.php?siteid=336', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86492464&build=5320902', u'testname': u'ROL_example_tempus_example_02_MPI_1', u'time': 1.93}
and differs by at least the key/value pair
listOfDicts[43]['prettyTime'] = '1s 290ms' != listOfDicts[36]['prettyTime'] = '1s 930ms'
This is getting pretty old.
It happened again:
From: [email protected] [email protected] Sent: Thursday, April 30, 2020 3:15 AM To: Bartlett, Roscoe A [email protected] Subject: FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-04-29
Build and Test results for Promoted ATDM Trilinos Builds on 2020-04-29 Builds on CDash (num/expected=36/40) Non-passing Tests on CDash (num=224)
Traceback (most recent call last):
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
lookedUpDict, dictDiffErrorMsg)
File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[83] =
{u'buildName': u'Trilinos-atdm-tlcc2-intel-debug-openmp', u'buildSummaryLink': u'build/5413930', u'buildstarttime': u'2020-04-29T15:10:23 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' Version 4.4.1\n\t\tHDF5 enabled (1.8.12)\n\t\tParallel IO enabled via HDF5 and/or PnetCDF\n\t\tParallel IO enabled via PnetCDF (1.8.1 of 28 Jan 2017)\n\t\tNC_HAVE_META_H defined\n\t\tAPI Version 2 support enabled\n\n', u'nprocs': 1, u'prettyProcTime': u'2s 330ms', u'prettyTime': u'2s 330ms', u'procTime': 2.33, u'site': u'chama', u'siteLink': u'viewSite.php?siteid=259', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=8266&build=5413930', u'testname': u'SEACASIoss_io_info_config_has_zoltan_MPI_1', u'time': 2.33}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[82] =
{u'buildName': u'Trilinos-atdm-tlcc2-intel-debug-openmp', u'buildSummaryLink': u'build/5413930', u'buildstarttime': u'2020-04-29T15:10:23 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' Version 4.4.1\n\t\tHDF5 enabled (1.8.12)\n\t\tParallel IO enabled via HDF5 and/or PnetCDF\n\t\tParallel IO enabled via PnetCDF (1.8.1 of 28 Jan 2017)\n\t\tNC_HAVE_META_H defined\n\t\tAPI Version 2 support enabled\n\n', u'nprocs': 1, u'prettyProcTime': u'1s 480ms', u'prettyTime': u'1s 480ms', u'procTime': 1.48, u'site': u'chama', u'siteLink': u'viewSite.php?siteid=259', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=8266&build=5413930', u'testname': u'SEACASIoss_io_info_config_has_zoltan_MPI_1', u'time': 1.48}
and differs by at least the key/value pair
listOfDicts[83]['prettyTime'] = '2s 330ms' != listOfDicts[82]['prettyTime'] = '1s 480ms'
And it happened again:
Traceback (most recent call last):
File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 564, in
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1143, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1078, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 928, in createLookupDictForListOfDicts
lookedUpDict, dictDiffErrorMsg)
File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 955, in raiseDuplicateDictEleException
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[31] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt', u'buildSummaryLink': u'build/5521034', u'buildstarttime': u'2020-06-09T03:04:50 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'slurmstepd: error: *** STEP 924994.3 ON swa1324 CANCELLED AT 2020-06-09T16:27:04 ***\n 474 1.280183e-srun: error: swa1324: tasks 0-1,3: Killed\nsrun: error: swa1324: task 2: Exited with exit code 15\n', u'nprocs': 4, u'prettyProcTime': u'12s 400ms', u'prettyTime': u'3s 100ms', u'procTime': 12.4, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/18733901', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 3.1}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[30] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt', u'buildSummaryLink': u'build/5521034', u'buildstarttime': u'2020-06-09T03:04:50 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'lurmstepd: error: *** STEP 917121.1620 ON swa1152 CANCELLED AT 2020-06-09T04:47:27 ***\nsrun: Job step aborted: Waiting up to 32 seconds for job step to finish.\nsrun: error: swa1152: tasks 1-2: Killed\n', u'nprocs': 4, u'prettyProcTime': u'13s 400ms', u'prettyTime': u'3s 350ms', u'procTime': 13.4, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/18661911', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 3.35}
and differs by at least the key/value pair
listOfDicts[31]['prettyTime'] = '3s 100ms' != listOfDicts[30]['prettyTime'] = '3s 350ms'
And again:
FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-06-16
Build and Test results for Promoted ATDM Trilinos Builds on 2020-06-16
Builds on CDash (num/expected=32/40)
Non-passing Tests on CDash (num=396)
Traceback (most recent call last):
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 564, in
checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1143, in createSearchableListOfTests
checkDictsAreSame_in=checkDictsAreSame_in )
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1078, in __init__
checkDictsAreSame_in=checkDictsAreSame_in)
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 928, in createLookupDictForListOfDicts
lookedUpDict, dictDiffErrorMsg)
File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 955, in raiseDuplicateDictEleException
" "+str(dictDiffErrorMsg))
Exception: Error, The element
listOfDicts[95] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg', u'buildSummaryLink': u'build/5541173', u'buildstarttime': u'2020-06-16T03:03:52 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'NCELLED AT 2020-06-16T09:50:17 ***\n 2 1.193944e-02 2.057289e-11 1.242588e-02 9.591923e-06 1.00e+02 1.02e-09 9.59e-06 5 5 1 0 srun: error: swa188: tasks 0-2: Killed\n', u'nprocs': 4, u'prettyProcTime': u'1m 9s 320ms', u'prettyTime': u'17s 330ms', u'procTime': 69.32, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/19960337', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 17.33}
has duplicate values for the list of keys
['site', 'buildName', 'testname']
with the element already added
listOfDicts[20] =
{u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg', u'buildSummaryLink': u'build/5541173', u'buildstarttime': u'2020-06-16T03:03:52 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' 1.187707e-02 6.668195e-08 1.797901e-04 2.130000e-02 5.58e+00 3.75e-10 2.13e-02 61 61 3 4 1 198/ RCPNode address = srun: error: swa1257: tasks 0-3: Killed\n', u'nprocs': 4, u'prettyProcTime': u'1m 11s', u'prettyTime': u'17s 750ms', u'procTime': 71, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/19941180', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 17.75}
and differs by at least the key/value pair
listOfDicts[95]['prettyTime'] = '17s 330ms' != listOfDicts[20]['prettyTime'] = '17s 750ms'
I will remove the build results for this build :-(
With the merge of commit https://github.com/bartlettroscoe/TriBITS/commit/280d1860f234009eec4437c4b2bbfe33d8a61ee4, now the cdash_analyze_and_report.py tool will allow and handle duplicate tests that have the same 'status' and 'details' fields. So hopefully that will resolve most of the problems we have been having with this.