software-layer
software-layer copied to clipboard
Non-optimal CPU detection of neoverse_v1 using archspec
When setting up the EESSI environment on the neoverse_v1 nodes on aws/citc archspec detects neoverse_n1 instead of neoverse_v1.
@fair-mastodon-c7g-2xlarge-0002 ~]$ source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!
archspec says aarch64/neoverse_n1
Using aarch64/neoverse_n1 as software subdirectory.
Using /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all as the directory to be added to MODULEPATH.
Found Lmod configuration file at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/.lmod/lmodrc.lua
Initializing Lmod...
Prepending /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all to $MODULEPATH...
Environment set up to use EESSI pilot software stack, have fun!
cat /proc/cpuinfo on c7g-2xlarge
processor : 0
BogoMIPS : 2100.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp svei8mm svebf16 i8mm bf16 dgh rng
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd40
CPU revision : 1
There are two missing CPU features on c7g-2xlarge: paca and pacg
You can check the list here
Together with #322, this is enough motivation to switch to using our own minimal archdetect implementation rather than relying on archspec for EESSI pilot 2023.06, I think...
@boegel There's an error in archdetect that is fixed as part of https://github.com/EESSI/software-layer/pull/264
@laraPPr Can you check whether archdetect correctly detects both neoverse_v1 and zen3 (cfr. #322), using:
EESSI_USE_ARCHDETECT=1 source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
Results on zen3: https://github.com/EESSI/software-layer/issues/322#issuecomment-1702640074
- c7g.2xlarge
[laraPPr@fair-mastodon-c7g-2xlarge-0001 ~]$ EESSI_USE_ARCHDETECT=1 source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!
2023-09-01 12:10:56 [INFO] cpupath: best match for host CPU: aarch64/arm/neoverse-v1
archdetect says aarch64/arm/neoverse-v1
Using aarch64/arm/neoverse-v1 as software subdirectory.
ERROR: EESSI software layer at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/arm/neoverse-v1 not found!
The error is because archdetect finds neoverse-v1 instead of neoverse_v1
This is also the case for neoverse_n1 which archdetect recognizes as neoverse-n1
And there's the extra arm/ subdirectory which doesn't exist.
As part of #264, I've fixed the Arm detection and added an additional check in CI that ensures that whatever archdetect spits out actually exists as an option.