openfast
openfast copied to clipboard
Failure of regression test setup with race condition
Bug description Regression tests occasionally fail due to a race condition when copying shared test files to the build directory.
To Reproduce This failure is difficult to reproduce as it's dependent on the timing of running the tests. The most likely way to reproduce is to run all the regression tests manually with CTest (currently around 80 tests) with as a large number of parallel threads, greater than 4 (flag --parallel).
Expected behavior Regression tests should not fail when copying files.
OpenFAST Version The failure was more likely to happen before commit 0f8237e was merged but is still possible.
Additional context The race condition is discussed in PR #1199 and PR #1244, but not resolved. The recommended solution is to move the copying of shared test files from the individual test scripts to a main script which runs before the tests are executed.
I can confirm that I still have sporadic failures of regression testing locally with CTest
. What I observed is the 5MW_Land_DLL_WTurb_py
case failed promptly when it was the first that CTest
ran with the ctest -j12
command (all 81 cases in parralel batches of 12). This case then passes if I rerun it. I'm not sure if it is due to a race condition in the file copying, or a different ordering.
The 5MW_Land_DLL_WTurb_py
case uses the input files from 5MW_Land_DLL_WTurb
to run. Perhaps these were not first copied.
See discussion #731 for additional ideas on what we can improve. Fixing this should be done with an overall improvement of the automated testing process.
Perhaps these were not first copied.
@andrew-platt , I think you are correct. I noticed those files weren't copied in the script to run the non-python HD driver of those test cases, so I added a copy of those files in #1222. (here: https://github.com/OpenFAST/openfast/pull/1222/files#diff-c1483b46f685cead3f87872ced29a4e253120dea6e7761ad19690719f30f966f)
I didn't modify the script for the python driver version, though, so if it happens to run before the glue code tests or the HD Fortran driver, the files won't be there.
@bjonkman, I think you are right. On GitHub actions we might just be getting lucky that the build for rtest-interfaces
takes longer than the build for the regression tests. With a local build and test, all the builds with make
are completed before running ctest
.
