easybuild-easyblocks icon indicating copy to clipboard operation
easybuild-easyblocks copied to clipboard

GROMACS (GROMACS-2020-fosscuda-2019b.eb) fails to install due to failed HardwareTopologyTest

Open jkwmoore opened this issue 4 years ago • 3 comments

Hi all,

We are encountering an error on our HPC cluster (Bessemer) while compiling GROMACS using easybuild with the GROMACS-2020-fosscuda-2019b.eb easyconfig (and others).

The details for this cluster can be found here: https://docs.hpc.shef.ac.uk/en/latest/bessemer/cluster_specs.html

The docs should contain all the info needed - the easybuild version is 4.3.1.

The failure seems to be limited to HWloc but the log output is not exceptionally useful. I will attach our compiler log file - could you assist?

In the short term I will likely attempt to bypass the testing portion of the compile and see what happens.

Pertinent bit:


==========] Running 5 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from CpuInfoTest
[ RUN      ] CpuInfoTest.SupportLevel
[       OK ] CpuInfoTest.SupportLevel (0 ms)
[----------] 1 test from CpuInfoTest (0 ms total)

[----------] 4 tests from HardwareTopologyTest
[ RUN      ] HardwareTopologyTest.Execute
[       OK ] HardwareTopologyTest.Execute (116 ms)
[ RUN      ] HardwareTopologyTest.HwlocExecute
/scratch/1526855/slurm_build_1526855/GROMACS/2020/fosscuda-2019b/gromacs-2020/src/gromacs/hardware/tests/hardwaretopology.cpp:88: Failure
Expected: (hwTop.supportLevel()) >= (gmx::HardwareTopology::SupportLevel::Basic), actual: 4-byte object <01-00 00-00> vs 4-byte object <02-00 00-00>
Cannot determine basic hardware topology from hwloc. GROMACS will still

work, but it might affect your performance for large nodes.
Please mail [email protected] so we can try to fix it.
[  FAILED  ] HardwareTopologyTest.HwlocExecute (64 ms)

Anyone seen this before? Given we're an x86 / x64 cluster and errors related to this typically refer to other architectures I am a bit stumped. OS: CentOS Linux release 7.9.2009 (Core)

Attaching the log: gromacscompilefailure.log

jkwmoore avatar Nov 17 '20 12:11 jkwmoore

Check the gromacs mailing list, I know I've seen it mentioned before, just can't remember where.

akesandgren avatar Nov 17 '20 13:11 akesandgren

Aye I saw the following: https://www.mail-archive.com/[email protected]/msg40764.html

Which was my plan to compile skipping that test.

jkwmoore avatar Nov 17 '20 13:11 jkwmoore

Having now skipped the hwloc issue we are running into OpenMPI issues rather than threadMPI (given usempi:false is working) where during the second iteration section of the build it is failing to complete the MPI tests correctly- line 29043:

PMI2_Init failed to intialize. Return code: 14

Our config for the OpenMPI in question as we are using SLURM https://github.com/RSE-Sheffield/bessemer-eb-cfg/blob/staging/etc/bessemer/easyconfigs/o/OpenMPI/OpenMPI-3.1.4-gcccuda-2019b.eb

Given we are running different versions of hwloc installed on the cluster vs the one used in the build this may be the cause for the issue with hwloc but we're clearly having issues with MPI

We are also seeing similar issues with building Rmpi although these may be unrelated - https://www.mail-archive.com/[email protected]/msg05558.html

Our build log when forcing a rebuild: srun --export=all -c 4 eb -drl --rebuild GROMACS-2020-fosscuda-2019b.eb

gromacslogging.build.log

jkwmoore avatar Nov 20 '20 16:11 jkwmoore