software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Non-optimal CPU detection of neoverse_v1 using archspec

Open laraPPr opened this issue 2 years ago • 9 comments

When setting up the EESSI environment on the neoverse_v1 nodes on aws/citc archspec detects neoverse_n1 instead of neoverse_v1.

@fair-mastodon-c7g-2xlarge-0002 ~]$ source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash

Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!

archspec says aarch64/neoverse_n1

Using aarch64/neoverse_n1 as software subdirectory.

Using /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all as the directory to be added to MODULEPATH.

Found Lmod configuration file at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/.lmod/lmodrc.lua

Initializing Lmod...

Prepending /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/modules/all to $MODULEPATH...

Environment set up to use EESSI pilot software stack, have fun!

laraPPr avatar Aug 29 '23 15:08 laraPPr

cat /proc/cpuinfo on c7g-2xlarge

processor	: 0

BogoMIPS	: 2100.00

Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp svei8mm svebf16 i8mm bf16 dgh rng

CPU implementer	: 0x41

CPU architecture: 8

CPU variant	: 0x1

CPU part	: 0xd40

CPU revision	: 1

laraPPr avatar Aug 29 '23 15:08 laraPPr

There are two missing CPU features on c7g-2xlarge: paca and pacg You can check the list here

laraPPr avatar Aug 29 '23 15:08 laraPPr

Together with #322, this is enough motivation to switch to using our own minimal archdetect implementation rather than relying on archspec for EESSI pilot 2023.06, I think...

boegel avatar Aug 31 '23 17:08 boegel

@boegel There's an error in archdetect that is fixed as part of https://github.com/EESSI/software-layer/pull/264

ocaisa avatar Aug 31 '23 19:08 ocaisa

@laraPPr Can you check whether archdetect correctly detects both neoverse_v1 and zen3 (cfr. #322), using:

EESSI_USE_ARCHDETECT=1 source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash

boegel avatar Sep 01 '23 09:09 boegel

Results on zen3: https://github.com/EESSI/software-layer/issues/322#issuecomment-1702640074

  • c7g.2xlarge
[laraPPr@fair-mastodon-c7g-2xlarge-0001 ~]$ EESSI_USE_ARCHDETECT=1 source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash

Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/versions/2023.06!

2023-09-01 12:10:56 [INFO] cpupath: best match for host CPU: aarch64/arm/neoverse-v1

archdetect says aarch64/arm/neoverse-v1

Using aarch64/arm/neoverse-v1 as software subdirectory.

ERROR: EESSI software layer at /cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/arm/neoverse-v1 not found!

laraPPr avatar Sep 01 '23 12:09 laraPPr

The error is because archdetect finds neoverse-v1 instead of neoverse_v1 This is also the case for neoverse_n1 which archdetect recognizes as neoverse-n1

laraPPr avatar Sep 01 '23 12:09 laraPPr

And there's the extra arm/ subdirectory which doesn't exist.

boegel avatar Sep 01 '23 13:09 boegel

As part of #264, I've fixed the Arm detection and added an additional check in CI that ensures that whatever archdetect spits out actually exists as an option.

ocaisa avatar Sep 03 '23 15:09 ocaisa