software-layer
software-layer copied to clipboard
Add CUDA software check to stack comparison CI
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/sapphirerapids,x86_64/intel/skylake_avx512,x86_64/intel/cascadelake,x86_64/intel/icelake,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi.io-2023.06-compat,eessi.io-2023.06-software
Instance eessi-bot-deucalion is configured to build for:
- architectures:
aarch64/a64fx - repositories:
eessi.io-2023.06-software
Instance rt-Grace-jr is configured to build for:
- architectures:
aarch64/nvidia/grace - repositories:
eessi.io-2023.06-software
Instance eessi-bot-surf is configured to build for:
- architectures:
x86_64/amd/zen4,x86_64/amd/zen2 - repositories:
eessi.io-2023.06-software,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-compat,eessi-hpc.org-2023.06-software
This will have to remain a Draft PR until the CUDA stacks are in sync
@casparvl This is now working, but it is comparing against generic (which is not really a problem since they are all supposed to be the same)
@ocaisa Can you retarget this pr and see if anything should be moved to https://github.com/EESSI/software-layer-scripts?
Retriggering CI
Retriggering this to see where we are
It needs this one to be merged first https://github.com/EESSI/software-layer-scripts/pull/23
I'm not relying on easystacks for this CI, I'm directly comparing the module files/directories.
Sorry for the confusion I thought this was a different pr.
Just discussed with Alan: in principle this is ready to go once the CI passes. I.e. for that, we need the missing software for the missing archs to be deployed.
https://github.com/EESSI/software-layer/pull/1147 should bring things in sync, so that the CI passes here. We may want to rebuild gromacs later to tell it to use NEON rather than SVE for certain ARM architectures.
https://github.com/EESSI/software-layer-scripts/pull/59 is an intermediate PR, that at least installed CUDA/cuDNN accross the board. Let's retrigger CI, see if those differences indeed went away.
Replaced by #1205 where it should get merged