openfast icon indicating copy to clipboard operation
openfast copied to clipboard

Failure of regression test setup with race condition

Open deslaughter opened this issue 2 years ago • 4 comments

Bug description Regression tests occasionally fail due to a race condition when copying shared test files to the build directory.

To Reproduce This failure is difficult to reproduce as it's dependent on the timing of running the tests. The most likely way to reproduce is to run all the regression tests manually with CTest (currently around 80 tests) with as a large number of parallel threads, greater than 4 (flag --parallel).

Expected behavior Regression tests should not fail when copying files.

OpenFAST Version The failure was more likely to happen before commit 0f8237e was merged but is still possible.

Additional context The race condition is discussed in PR #1199 and PR #1244, but not resolved. The recommended solution is to move the copying of shared test files from the individual test scripts to a main script which runs before the tests are executed.

deslaughter avatar Sep 07 '22 16:09 deslaughter

I can confirm that I still have sporadic failures of regression testing locally with CTest. What I observed is the 5MW_Land_DLL_WTurb_py case failed promptly when it was the first that CTest ran with the ctest -j12 command (all 81 cases in parralel batches of 12). This case then passes if I rerun it. I'm not sure if it is due to a race condition in the file copying, or a different ordering.

The 5MW_Land_DLL_WTurb_py case uses the input files from 5MW_Land_DLL_WTurb to run. Perhaps these were not first copied.

andrew-platt avatar Sep 07 '22 17:09 andrew-platt

See discussion #731 for additional ideas on what we can improve. Fixing this should be done with an overall improvement of the automated testing process.

andrew-platt avatar Sep 07 '22 17:09 andrew-platt

Perhaps these were not first copied.

@andrew-platt , I think you are correct. I noticed those files weren't copied in the script to run the non-python HD driver of those test cases, so I added a copy of those files in #1222. (here: https://github.com/OpenFAST/openfast/pull/1222/files#diff-c1483b46f685cead3f87872ced29a4e253120dea6e7761ad19690719f30f966f)

I didn't modify the script for the python driver version, though, so if it happens to run before the glue code tests or the HD Fortran driver, the files won't be there.

bjonkman avatar Sep 07 '22 18:09 bjonkman

@bjonkman, I think you are right. On GitHub actions we might just be getting lucky that the build for rtest-interfaces takes longer than the build for the regression tests. With a local build and test, all the builds with make are completed before running ctest.

Screen Shot 2022-09-07 at 12 07 26 PM

andrew-platt avatar Sep 07 '22 18:09 andrew-platt