TriBITS
TriBITS copied to clipboard
Add usage of STOP_TIME to ctest_test() for TriBITS Dashboard Driver and TriBITS CTest Driver
Currently if a build run with the TriBITS Dashbaord Driver system times-out (due to the CTest TIMEOUT property on the test that runs ctest -S ...), then it aborts all of the tests that are currently running for a TriBITS package and all of the test results that may have completed are lost and not uploaded to CDash. This is a problem with the CASL VERA WEEKLY build since it will result in a loss of test results for all of the tests that are running for the package currently being processed. This often happens when a test in a test suite hangs and the hard timeout for that test is too long. This has happened for tests mostly in the CASL VERA VUQDemos packages in WEEKLY testing but has also happened for the Tiamat package tests in WEEKLY testing as well.
To make this work well, you would set the STOP_TIME option in the ctest_test() command that drives the individual builds (run as ctest -S ... for each build case). From CASL VERA, for example, you start testing at 00:05:00 (i.e. 12:05 am) and would set STOP_TIME to 23:50:00 (i.e. 11:50 pm) in the TDD driver file TribitsDriverDashboard.cmake. You could set that with a variable TDD_CTEST_STOP_TIME. To allow for some time to kill the inner individual tests before the outer stop time was reached, you would pass in an slightly earlier stop time to the ctets_test() command in the TribitsCTestDriverCore.cmake file. For example, you would set that with a variable CTEST_STOP_TIME and set that to be 23:45:00 for VERA, for example.
Tasks:
- Add
CTEST_STOP_TIMEto TribitsCTestDriveCore.cmake and experiment with what happens when STOP_TIME is triggered end of testing (see what it looks like on the dashboard) ... - Add
TDD_CTEST_STOP_TIMEto TribitsDriverDashboard.cmake and do some experimental testing to make sure that it kills the test independent from the relative timeout that is sets ...
This is also an issue for the Trilinos CI server as well (see trilinos/Trilinos#482). We would like to run a CI server all day long but have it restart every day with a fresh build. Setting absolute stop times would ensure that the last CI build would be (gracefully) killed before the fresh CI build the next testing day was started.
This is also an issue for the ATDM Trilinos builds on ATS-2 system 'vortex' as well (see ATDV-396). We need to get partial test results to see what is happening. I think I am going to bit the bullt and try to get this done.