TriBITS icon indicating copy to clipboard operation
TriBITS copied to clipboard

cdash_analyze_and_report.py: Make robust to duplicate tests

Open bartlettroscoe opened this issue 5 years ago • 9 comments

Currently, the tool cdash_analyze_and_report.py will error our if there are duplicate test results. In theory this should never happen. But in practice, this can happen because of problems with the way the automated builds and tests are driven. For example consider the ATDM Trilinos build Trilinos-atdm-tlcc2-intel-opt-openmp shown here which shows:

Site Build Name Update Conf Err Conf Warn Build Err Build Warn Test Not Run Test Fail Test Pass Start Time Labels
skybridge Trilinos-atdm-tlcc2-intel-opt-openmp b52c49 0 20 0 50 3424 18 2552 9 hours ago (24 labels)
skybridge Trilinos-atdm-tlcc2-intel-opt-openmp ab779d 0 20 0 30       Feb 19, 2020 - 04:03 MST (24 labels)
skybridge Trilinos-atdm-tlcc2-intel-opt-openmp 2888f0 0 20 0 0       Feb 18, 2020 - 04:03 MST (24 labels)
skybridge Trilinos-atdm-tlcc2-intel-opt-openmp 1b40ec 0 20 0 0 0 0 1956 Feb 17, 2020 - 04:03 MST (24 labels)
skybridge Trilinos-atdm-tlcc2-intel-opt-openmp 5e7cdf 0 20 0 50 0 0 1956 Feb 16, 2020 - 04:03 MST (24 labels)

What happened here is that the SLURM job to run the tests on the testing days 2020-02-18 and 2020-02-19 got held up and did not start running until the testing day 2020-02-20 and then these three ctest -S jobs just ran on top of each other. Now that is not supposed to happen but Jenkins is not killing those testing jobs. But in any case, running:

cdash_analyze_and_report.py \
  --date='2020-02-20' \
  --cdash-project-testing-day-start-time='04:01' \
  --cdash-project-name='Trilinos' \
  --build-set-name='Promoted ATDM Trilinos Builds' \
  --cdash-site-url='http://testing.sandia.gov/cdash' \
  --cdash-builds-filters='filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-' \
  --cdash-nonpassed-tests-filters='filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable' \
  --expected-builds-file='/home/rabartl/Trilinos.base/TrilinosATDMStatus/promotedAtdmTrilinosExpectedBuilds.csv' \
  --tests-with-issue-trackers-file='/home/rabartl/Trilinos.base/TrilinosATDMStatus/promotedAtdmTrilinosTestsWithIssueTrackers.csv' \
  --cdash-queries-cache-dir='/home/rabartl/Trilinos.base/ATDMStatusEmails' \
  --cdash-base-cache-files-prefix='promotedAtdmTrilinosBuilds_' \
  --use-cached-cdash-data='off' \
  --limit-test-history-days='30' \
  --limit-table-rows='20' \
  --require-test-history-match-nonpassing-tests='off' \
  --print-details='off' \
  --write-failing-tests-without-issue-trackers-to-file='promotedAtdmTrilinosTwoif.csv' \
  --write-email-to-file='promotedAtdmTrilinosBuilds.html' \
  --email-from-address='' \
  --send-email-to='' \

produces the error:

***
*** Query and analyze CDash results for Promoted ATDM Trilinos Builds for testing day 2020-02-20
***

Num expected builds = 33

Num tests with issue trackers = 93

CDash builds browser URL:

  http://testing.sandia.gov/cdash/index.php?project=Trilinos&date=2020-02-20&filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-

  Downloading CDash data from:
    http://testing.sandia.gov/cdash/api/v1/index.php?project=Trilinos&date=2020-02-20&filtercount=2&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-
  Caching data downloaded from CDash to file:
    /home/rabartl/Trilinos.base/ATDMStatusEmails/promotedAtdmTrilinosBuilds_fullCDashIndexBuilds.json

Num builds = 30

Getting list of nonpassing tests from CDash ...


CDash nonpassing tests browser URL:

  http://testing.sandia.gov/cdash/queryTests.php?project=Trilinos&date=2020-02-20&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable

  Downloading CDash data from:
    http://testing.sandia.gov/cdash/api/v1/queryTests.php?project=Trilinos&date=2020-02-20&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=ATDM&field2=buildname&compare2=65&value2=Trilinos-atdm-&field3=status&compare3=62&value3=passed&field4=testoutput&compare4=94&value4=Error%3A%20Remote%20JSM%20server%20is%20not%20responding%20on%20host%20vortex&field5=testoutput&compare5=94&value5=csm_net_unix_Connect%3A%20Resource%20temporarily%20unavailable
  Caching data downloaded from CDash to file:
    /home/rabartl/Trilinos.base/ATDMStatusEmails/promotedAtdmTrilinosBuilds_fullCDashNonpassingTests.json

Num nonpassing tests direct from CDash query = 3463

Traceback (most recent call last):
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 613, in <module>
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 994, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 929, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 795, in createLookupDictForListOfDicts
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[21] =

      {u'buildName': u'Trilinos-atdm-tlcc2-intel-opt-openmp', u'buildSummaryLink': u'buildSummary.php?buildid=6440363', u'buildstarttime': u'2020-02-20T04:07:38 MST', u'details': u'Completed (Failed)\n', u'matchingoutput': u'==============================================\n\nOVERALL FINAL RESULT: TEST FAILED (SEACASAprepro_aprepro_array_test)\n\nXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n\n', u'nprocs': 1, u'prettyProcTime': u' 170ms', u'prettyTime': u' 170ms', u'procTime': 0.17, u'site': u'skybridge', u'siteLink': u'viewSite.php?siteid=317', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=102685529&build=6440363', u'testname': u'SEACASAprepro_aprepro_array_test', u'time': 0.17}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[20] =

      {u'buildName': u'Trilinos-atdm-tlcc2-intel-opt-openmp', u'buildSummaryLink': u'buildSummary.php?buildid=6440363', u'buildstarttime': u'2020-02-20T04:07:38 MST', u'details': u'Completed (Failed)\n', u'matchingoutput': u'==============================================\n\nOVERALL FINAL RESULT: TEST FAILED (SEACASAprepro_aprepro_array_test)\n\nXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n\n', u'nprocs': 1, u'prettyProcTime': u' 320ms', u'prettyTime': u' 320ms', u'procTime': 0.32, u'site': u'skybridge', u'siteLink': u'viewSite.php?siteid=317', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=102685529&build=6440363', u'testname': u'SEACASAprepro_aprepro_array_test', u'time': 0.32}

  and differs by at least the key/value pair

    listOfDicts[21]['prettyTime'] = ' 170ms' != listOfDicts[20]['prettyTime'] = ' 320ms'

Error, could not compute the analysis due to above error so return failed!

Writing HTML file 'promotedAtdmTrilinosBuilds.html' ...

FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-02-20

It would be nice if this script would either automatically (or by passing in an option), just discard duplicate tests. I think it should likely discard duplicate tests according to the logic:

  • First: Take a passing test
  • Second: Take a failing test
  • Third: Take a not-run test
  • Otherwise: If the the test status is the same, take the test with the smaller runtime.

The question is how to apply this? Here are the options:

  1. Silently filter out duplicate tests using the above logic
  2. Filter out duplicate tests but warn the user there were duplicate tests detected
  3. Require an option like --require-unique-tests=off to allow duplicate tests to be removed

I am thinking that it may be best to have an option --require-unique-tests but to make it off by default and allow a user to enable it if they want to validate unique tests (and therefore validate their data on CDash).

bartlettroscoe avatar Feb 20 '20 20:02 bartlettroscoe

This just happened again yesterday in the build Trilinos-atdm-tlcc2-intel-opt-openmp on skybridge yesterday and it crashed the script. Therefore, I am putting this in progress and I will get this done.

bartlettroscoe avatar Feb 25 '20 22:02 bartlettroscoe

On second thought, the option --require-unique-tests should be on by default and we should create a good error that the user should consider setting --require-unique-tests=off in order to get the script to process the data and return something.

bartlettroscoe avatar Feb 26 '20 16:02 bartlettroscoe

Had another one of these failures yesterday due to doubling up on test results. The error was:

Traceback (most recent call last):                                                                                                                                                                                 
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in <module>                                                                              
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
    lookedUpDict, dictDiffErrorMsg)
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[2] =

      {u'buildName': u'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', u'buildSummaryLink': u'buildSummary.php?buildid=5311174', u'buildstarttime': u'2020-03-19T03:07:51 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'tex50:92004] -----------------------\n[vortex50:92004] -----------------------\n[vortex50:92004] *** End of error message ***\nERROR:  One or more process (first noticed rank 1) terminated with signal 6\n', u'nprocs': 4, u'prettyProcTime': u'7m 28s 680ms', u'prettyTime': u'1m 52s 170ms', u'procTime': 448.68, u'site': u'vortex', u'siteLink': u'viewSite.php?siteid=341', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86269094&build=5311174', u'testname': u'MueLu_Maxwell3D-Tpetra_MPI_4', u'time': 112.17}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[1] =

      {u'buildName': u'Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-2019.06.24_static_dbg', u'buildSummaryLink': u'buildSummary.php?buildid=5311174', u'buildstarttime': u'2020-03-19T03:07:51 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'tex50:61649] -----------------------\n[vortex50:61649] -----------------------\n[vortex50:61649] *** End of error message ***\nERROR:  One or more process (first noticed rank 0) terminated with signal 6\n', u'nprocs': 4, u'prettyProcTime': u'6m 10s 160ms', u'prettyTime': u'1m 32s 540ms', u'procTime': 370.16, u'site': u'vortex', u'siteLink': u'viewSite.php?siteid=341', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86208296&build=5311174', u'testname': u'MueLu_Maxwell3D-Tpetra_MPI_4', u'time': 92.54}

  and differs by at least the key/value pair

    listOfDicts[2]['prettyTime'] = '1m 52s 170ms' != listOfDicts[1]['prettyTime'] = '1m 32s 540ms'

Error, could not compute the analysis due to above error so return failed!

This seems to be happening in the recent past.

For now, I am going to just delete that build off CDAsh.

bartlettroscoe avatar Mar 20 '20 14:03 bartlettroscoe

NOTE: One thing about these tests that is true is that they all of the same "Build Time" so you can't filter tests based on that. And since the tests are all running on the same executables (perhaps while they are still being build), we really just want to prefer to take the passing tests first I think.

bartlettroscoe avatar Mar 22 '20 15:03 bartlettroscoe

This happened again today. The running of test on 'eclipse' was held up for 3 days until they all ran at once toward the end of the day. This caused the cdash_analyze_and_report..py tool to generate the following error message:

Traceback (most recent call last):
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in 
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
    lookedUpDict, dictDiffErrorMsg)
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[43] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt', u'buildSummaryLink': u'buildSummary.php?buildid=5320902', u'buildstarttime': u'2020-03-23T03:07:14 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'/Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt/SRC_AND_BUILD/BUILD/packages/rol/example/tempus/ROL_example_tempus_example_02.exe[0x4100e9]\n[ec392:41133] *** End of error message ***\n', u'nprocs': 1, u'prettyProcTime': u'1s 290ms', u'prettyTime': u'1s 290ms', u'procTime': 1.29, u'site': u'eclipse', u'siteLink': u'viewSite.php?siteid=336', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86495027&build=5320902', u'testname': u'ROL_example_tempus_example_02_MPI_1', u'time': 1.29}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[36] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.5_openmpi-4.0.1_openmp_static_opt', u'buildSummaryLink': u'buildSummary.php?buildid=5320902', u'buildstarttime': u'2020-03-23T03:07:14 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'nmp_static_opt/SRC_AND_BUILD/BUILD/packages/rol/example/tempus/ROL_example_tempus_example_02.exe[0x4100e9]\n[ec1476:118026] *** End of error message ***\nsrun: error: ec1476: task 0: Segmentation fault\n', u'nprocs': 1, u'prettyProcTime': u'1s 930ms', u'prettyTime': u'1s 930ms', u'procTime': 1.93, u'site': u'eclipse', u'siteLink': u'viewSite.php?siteid=336', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=86492464&build=5320902', u'testname': u'ROL_example_tempus_example_02_MPI_1', u'time': 1.93}

  and differs by at least the key/value pair

    listOfDicts[43]['prettyTime'] = '1s 290ms' != listOfDicts[36]['prettyTime'] = '1s 930ms'

This is getting pretty old.

bartlettroscoe avatar Mar 24 '20 14:03 bartlettroscoe

It happened again:


From: [email protected] [email protected] Sent: Thursday, April 30, 2020 3:15 AM To: Bartlett, Roscoe A [email protected] Subject: FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-04-29

Build and Test results for Promoted ATDM Trilinos Builds on 2020-04-29 Builds on CDash (num/expected=36/40) Non-passing Tests on CDash (num=224)

Traceback (most recent call last):
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 627, in 
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 997, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 932, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 782, in createLookupDictForListOfDicts
    lookedUpDict, dictDiffErrorMsg)
  File "/jenkins/slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 809, in raiseDuplicateDictEleException
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[83] =

      {u'buildName': u'Trilinos-atdm-tlcc2-intel-debug-openmp', u'buildSummaryLink': u'build/5413930', u'buildstarttime': u'2020-04-29T15:10:23 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' Version 4.4.1\n\t\tHDF5 enabled (1.8.12)\n\t\tParallel IO enabled via HDF5 and/or PnetCDF\n\t\tParallel IO enabled via PnetCDF (1.8.1 of 28 Jan 2017)\n\t\tNC_HAVE_META_H defined\n\t\tAPI Version 2 support enabled\n\n', u'nprocs': 1, u'prettyProcTime': u'2s 330ms', u'prettyTime': u'2s 330ms', u'procTime': 2.33, u'site': u'chama', u'siteLink': u'viewSite.php?siteid=259', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=8266&build=5413930', u'testname': u'SEACASIoss_io_info_config_has_zoltan_MPI_1', u'time': 2.33}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[82] =

      {u'buildName': u'Trilinos-atdm-tlcc2-intel-debug-openmp', u'buildSummaryLink': u'build/5413930', u'buildstarttime': u'2020-04-29T15:10:23 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' Version 4.4.1\n\t\tHDF5 enabled (1.8.12)\n\t\tParallel IO enabled via HDF5 and/or PnetCDF\n\t\tParallel IO enabled via PnetCDF (1.8.1 of 28 Jan 2017)\n\t\tNC_HAVE_META_H defined\n\t\tAPI Version 2 support enabled\n\n', u'nprocs': 1, u'prettyProcTime': u'1s 480ms', u'prettyTime': u'1s 480ms', u'procTime': 1.48, u'site': u'chama', u'siteLink': u'viewSite.php?siteid=259', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'testDetails.php?test=8266&build=5413930', u'testname': u'SEACASIoss_io_info_config_has_zoltan_MPI_1', u'time': 1.48}

  and differs by at least the key/value pair

    listOfDicts[83]['prettyTime'] = '2s 330ms' != listOfDicts[82]['prettyTime'] = '1s 480ms'

bartlettroscoe avatar Apr 30 '20 15:04 bartlettroscoe

And it happened again:

Traceback (most recent call last):
  File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 564, in 
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1143, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1078, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 928, in createLookupDictForListOfDicts
    lookedUpDict, dictDiffErrorMsg)
  File "/nscratch/jenkins/eclipse-slave/workspace/Trilinos-atdm-send-email/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 955, in raiseDuplicateDictEleException
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[31] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt', u'buildSummaryLink': u'build/5521034', u'buildstarttime': u'2020-06-09T03:04:50 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'slurmstepd: error: *** STEP 924994.3 ON swa1324 CANCELLED AT 2020-06-09T16:27:04 ***\n  474   1.280183e-srun: error: swa1324: tasks 0-1,3: Killed\nsrun: error: swa1324: task 2: Exited with exit code 15\n', u'nprocs': 4, u'prettyProcTime': u'12s 400ms', u'prettyTime': u'3s 100ms', u'procTime': 12.4, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/18733901', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 3.1}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[30] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt', u'buildSummaryLink': u'build/5521034', u'buildstarttime': u'2020-06-09T03:04:50 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'lurmstepd: error: *** STEP 917121.1620 ON swa1152 CANCELLED AT 2020-06-09T04:47:27 ***\nsrun: Job step aborted: Waiting up to 32 seconds for job step to finish.\nsrun: error: swa1152: tasks 1-2: Killed\n', u'nprocs': 4, u'prettyProcTime': u'13s 400ms', u'prettyTime': u'3s 350ms', u'procTime': 13.4, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/18661911', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 3.35}

  and differs by at least the key/value pair

    listOfDicts[31]['prettyTime'] = '3s 100ms' != listOfDicts[30]['prettyTime'] = '3s 350ms'

bartlettroscoe avatar Jun 10 '20 14:06 bartlettroscoe

And again:

FAILED (SCRIPT CRASHED): Promoted ATDM Trilinos Builds on 2020-06-16

Build and Test results for Promoted ATDM Trilinos Builds on 2020-06-16

Builds on CDash (num/expected=32/40)
Non-passing Tests on CDash (num=396)


Traceback (most recent call last):
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/cdash_analyze_and_report.py", line 564, in 
    checkDictsAreSame_in=CDQAR.checkCDashTestDictsAreSame )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1143, in createSearchableListOfTests
    checkDictsAreSame_in=checkDictsAreSame_in )
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 1078, in __init__
    checkDictsAreSame_in=checkDictsAreSame_in)
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 928, in createLookupDictForListOfDicts
    lookedUpDict, dictDiffErrorMsg)
  File "/home/rabartl/Trilinos.base/TrilinosATDMStatus/TriBITS/tribits/ci_support/CDashQueryAnalyzeReport.py", line 955, in raiseDuplicateDictEleException
    "    "+str(dictDiffErrorMsg))
Exception: Error, The element

    listOfDicts[95] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg', u'buildSummaryLink': u'build/5541173', u'buildstarttime': u'2020-06-16T03:03:52 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u'NCELLED AT 2020-06-16T09:50:17 ***\n  2     1.193944e-02   2.057289e-11   1.242588e-02   9.591923e-06   1.00e+02  1.02e-09  9.59e-06  5       5       1       0   srun: error: swa188: tasks 0-2: Killed\n', u'nprocs': 4, u'prettyProcTime': u'1m 9s 320ms', u'prettyTime': u'17s 330ms', u'procTime': 69.32, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/19960337', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 17.33}

  has duplicate values for the list of keys

    ['site', 'buildName', 'testname']

  with the element already added

    listOfDicts[20] =

      {u'buildName': u'Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg', u'buildSummaryLink': u'build/5541173', u'buildstarttime': u'2020-06-16T03:03:52 MDT', u'details': u'Completed (Failed)\n', u'matchingoutput': u' 1.187707e-02   6.668195e-08   1.797901e-04   2.130000e-02   5.58e+00  3.75e-10  2.13e-02  61      61      3       4       1       198/       RCPNode address = srun: error: swa1257: tasks 0-3: Killed\n', u'nprocs': 4, u'prettyProcTime': u'1m 11s', u'prettyTime': u'17s 750ms', u'procTime': 71, u'site': u'attaway', u'siteLink': u'viewSite.php?siteid=337', u'status': u'Failed', u'statusclass': u'error', u'testDetailsLink': u'test/19941180', u'testname': u'ROL_adapters_tpetra_test_sol_TpetraSimulatedConstraintInterfaceCVaR_MPI_4', u'time': 17.75}

  and differs by at least the key/value pair

    listOfDicts[95]['prettyTime'] = '17s 330ms' != listOfDicts[20]['prettyTime'] = '17s 750ms'

I will remove the build results for this build :-(

bartlettroscoe avatar Jun 16 '20 16:06 bartlettroscoe

With the merge of commit https://github.com/bartlettroscoe/TriBITS/commit/280d1860f234009eec4437c4b2bbfe33d8a61ee4, now the cdash_analyze_and_report.py tool will allow and handle duplicate tests that have the same 'status' and 'details' fields. So hopefully that will resolve most of the problems we have been having with this.

bartlettroscoe avatar Jun 16 '20 21:06 bartlettroscoe