PyTorch test/tutorials are (likely) using the same model files.
Doing:
ctest -R tmva -j 32
will result in an arbitrary result (sometimes pass sometime fail) for
gtest-tmva-pymva-TestRModelParserKeras
gtest-tmva-pymva-TestRModelParserPyTorch
re-running just those tests (whether they succeeded or not) will lead to both of them failing. The failure report is indicate that they 'now' need the BLAS library (which is not available on the system).
As a possible clue (or not), the following 3 test fails systemically on the system due to the missing BLAS library:
996 - tutorial-tmva-TMVA_SOFIE_GNN_Application (Failed)
1000 - tutorial-tmva-TMVA_SOFIE_RDataFrame (Failed)
1002 - tutorial-tmva-TMVA_SOFIE_RSofieReader (Failed)
It is confirms that one of those files:
-rw-r--r--. 1 pcanal us_cms 8962 Oct 19 17:56 ./tmva/pymva/test/PyTorchModelSequential.pt
-rw-r--r--. 1 pcanal us_cms 11913 Oct 19 17:56 ./runtutorials/modelClassification.pt
-rw-r--r--. 1 pcanal us_cms 10564 Oct 19 17:56 ./runtutorials/PyTorchModel.pt
-rw-r--r--. 1 pcanal us_cms 10941 Oct 19 17:56 ./runtutorials/modelMultiClass.pt
-rw-r--r--. 1 pcanal us_cms 11330 Oct 19 17:56 ./runtutorials/trainedModelMultiClass.pt
-rw-r--r--. 1 pcanal us_cms 12110 Oct 19 17:56 ./runtutorials/trainedModelClassification.pt
-rw-r--r--. 1 pcanal us_cms 7853 Oct 19 17:57 ./runtutorials/modelRegression.pt
-rw-r--r--. 1 pcanal us_cms 7972 Oct 19 17:57 ./runtutorials/trainedModelRegression.pt
-rw-r--r--. 1 pcanal us_cms 11044 Oct 19 18:02 ./tmva/pymva/test/PyTorchModelModule.pt
-rw-r--r--. 1 pcanal us_cms 8337 Oct 19 18:02 ./tmva/pymva/test/PyTorchModelConvolution.pt
-rw-r--r--. 1 pcanal us_cms 684930 Oct 19 18:02 ./runtutorials/PyTorchTrainedModelCNN.pt
-rw-r--r--. 1 pcanal us_cms 684658 Oct 19 18:02 ./runtutorials/PyTorchModelCNN.pt
is making gtest-tmva-pymva-TestRModelParserPyTorch fail.
However gtest-tmva-pymva-TestRModelParserKeras fails without or without those files.
Apparently it is the test itself that is not runnable a second time :(:(
jupyter-pcanal-rootdevel:quick-devel pcanal$ ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ... Passed 15.87 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 16.13 sec
jupyter-pcanal-rootdevel:quick-devel pcanal$ ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...***Failed 9.29 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 9.55 sec
The following tests FAILED:
349 - gtest-tmva-pymva-TestRModelParserPyTorch (Failed)
Errors while running CTest
Output from these tests are in: /home/pcanal/root_working/build/quick-devel/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
Re-assigning to @lmoneta
@pcanal , could you please re-summarise the status also given the better understanding we have of https://github.com/root-project/root/issues/16720 ?
The summary is simple (and still the same after applying 38b0d88 (#16722):
On first run in a clean directory with BLAS missing, we get:
ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ... Passed 16.11 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 16.37 sec
and if we immediately re-run we get:
ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...***Failed 9.10 sec
and the error is:
[ RUN ] RModelParser_PyTorch.SEQUENTIAL_MODEL
IncrementalExecutor::executeFunction: symbol 'sgemm_' unresolved while linking [cling interface function]!
indicates that on the 2nd runs, the test want symbols from the BLAS library.
@guitargeek @pcanal is this issue maybe fixed by https://github.com/root-project/root/pull/18257 ?
Also related: https://github.com/root-project/root/issues/16553
This issue appears to indeed be solved.
Hi @pcanal, @lmoneta,
It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.
Sincerely, :robot: