cp2k
cp2k copied to clipboard
Tests in QS/regtest-as-3 fail with four ompthreads and mpiranks
Running tests/do_regtest.py with the command line parameters --mpiranks 4 --ompthreads 4 --maxtasks 32 causes critical errors even though the CP2K installation is correct (I'm quite sure by now at least).
Running it like this with CP2K version 2024.2 produces the following errors:
- in
QS/regtest-as-3/is produced the critical abort:
*******************************************************************************
* ___ *
* / \ *
* [ABORT] *
* \___/ sum of local cols not equal global cols *
* | *
* O/| *
* /| | *
* / \ cp_fm_struct.F:261 *
*******************************************************************************
- and in another test also in
QS/regtest-as-3/normal failures:
h2_gpw_nostore_group2.inp - TIMED OUT ( 400.28 sec)
h2_gpw_nostore_group2.inp - TIMED OUT ( 400.26 sec)
h2_gpw_ht_group2.inp - RUNTIME FAIL ( 1.40 sec)
h2_gpw_ht_group2.inp - RUNTIME FAIL ( 1.50 sec)
h2_gpw_ht_nostore_group2.inp - RUNTIME FAIL ( 1.47 sec)
h2_gpw_ht_nostore_group2.inp - RUNTIME FAIL ( 1.45 sec)
CP2K was installed via spack in one case and via the developer's toolchain script in the other case.
In the end I found out via trial and error, that I do need --maxtasks 32 when running the tests on an HPC cluster in an interactive job. Otherwise, do_regtest.py will start too many instances at the same time and then most will time out.
In the end, do_regtest.py --maxtasks 32 <path> <version> ran entirely without errors for the same CP2K binaries (It was psmp in all cases).
Yet --mpiranks 4 --ompthreads 4 seem to be detrimental parameters. I believe this should not happen. Instead the regression test script should skip some tests if they would be in conflict with the --mpiranks and --ompthreads specifications.
I hope this report helps to make the installation and regression testing clearer for the users.
I think @fstein93 added these tests when he implemented the half-transformed integral calculation in the AS module.
Ah, very good, thank you because then I can talk to him directly in the next office ... once he is back from vacation.
I've changed the title of this issue as it really seems to be about the AS code rather than the do_regtest.py script.
We have the ability to restrict test directories to certain combinations, e.g. https://github.com/cp2k/cp2k/blob/2de90c3493f93ab5519e4c813f6f1ead4f3a97fc/tests/TEST_DIRS#L250
For the dashboard we currently only test with 2 ompthreads and mpiranks. Maybe we should add a weekly tests for larger runs?
I am back from holidays. The feature tested here is similar to the GROUP_SIZE feature in the WF_CORRELATION section tested in QS/regtest-mp2-block. Still, it is strange that the tests in QS/regtest-as-3 fail or run indefinitely whereas the ones in QS/regtest-mp2-block do not. I will get in touch with @knuedd directly to discuss ways to solve it.