cp2k icon indicating copy to clipboard operation
cp2k copied to clipboard

Tests in QS/regtest-as-3 fail with four ompthreads and mpiranks

Open knuedd opened this issue 1 year ago • 4 comments

Running tests/do_regtest.py with the command line parameters --mpiranks 4 --ompthreads 4 --maxtasks 32 causes critical errors even though the CP2K installation is correct (I'm quite sure by now at least).

Running it like this with CP2K version 2024.2 produces the following errors:

  • in QS/regtest-as-3/ is produced the critical abort:
 *******************************************************************************
 *   ___                                                                       *
 *  /   \                                                                      *
 * [ABORT]                                                                     *
 *  \___/                 sum of local cols not equal global cols              *
 *    |                                                                        *
 *  O/|                                                                        *
 * /| |                                                                        *
 * / \                                                      cp_fm_struct.F:261 *
 *******************************************************************************
  • and in another test also in QS/regtest-as-3/ normal failures:
    h2_gpw_nostore_group2.inp                                -    TIMED OUT ( 400.28 sec)
    h2_gpw_nostore_group2.inp                                -    TIMED OUT ( 400.26 sec)
    h2_gpw_ht_group2.inp                                     - RUNTIME FAIL (   1.40 sec)
    h2_gpw_ht_group2.inp                                     - RUNTIME FAIL (   1.50 sec)
    h2_gpw_ht_nostore_group2.inp                             - RUNTIME FAIL (   1.47 sec)
    h2_gpw_ht_nostore_group2.inp                             - RUNTIME FAIL (   1.45 sec)

CP2K was installed via spack in one case and via the developer's toolchain script in the other case.

In the end I found out via trial and error, that I do need --maxtasks 32 when running the tests on an HPC cluster in an interactive job. Otherwise, do_regtest.py will start too many instances at the same time and then most will time out.

In the end, do_regtest.py --maxtasks 32 <path> <version> ran entirely without errors for the same CP2K binaries (It was psmp in all cases).

Yet --mpiranks 4 --ompthreads 4 seem to be detrimental parameters. I believe this should not happen. Instead the regression test script should skip some tests if they would be in conflict with the --mpiranks and --ompthreads specifications.

I hope this report helps to make the installation and regression testing clearer for the users.

knuedd avatar Dec 02 '24 07:12 knuedd

I think @fstein93 added these tests when he implemented the half-transformed integral calculation in the AS module.

stefabat avatar Dec 02 '24 11:12 stefabat

Ah, very good, thank you because then I can talk to him directly in the next office ... once he is back from vacation.

knuedd avatar Dec 02 '24 11:12 knuedd

I've changed the title of this issue as it really seems to be about the AS code rather than the do_regtest.py script.

We have the ability to restrict test directories to certain combinations, e.g. https://github.com/cp2k/cp2k/blob/2de90c3493f93ab5519e4c813f6f1ead4f3a97fc/tests/TEST_DIRS#L250

For the dashboard we currently only test with 2 ompthreads and mpiranks. Maybe we should add a weekly tests for larger runs?

oschuett avatar Dec 03 '24 11:12 oschuett

I am back from holidays. The feature tested here is similar to the GROUP_SIZE feature in the WF_CORRELATION section tested in QS/regtest-mp2-block. Still, it is strange that the tests in QS/regtest-as-3 fail or run indefinitely whereas the ones in QS/regtest-mp2-block do not. I will get in touch with @knuedd directly to discuss ways to solve it.

fstein93 avatar Dec 16 '24 08:12 fstein93