software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Add CUDA software check to stack comparison CI

Open ocaisa opened this issue 6 months ago • 6 comments

ocaisa avatar May 20 '25 13:05 ocaisa

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar May 20 '25 13:05 eessi-bot[bot]

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi.io-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software

eessi-bot-surf[bot] avatar May 20 '25 13:05 eessi-bot-surf[bot]

This will have to remain a Draft PR until the CUDA stacks are in sync

ocaisa avatar May 20 '25 14:05 ocaisa

@casparvl This is now working, but it is comparing against generic (which is not really a problem since they are all supposed to be the same)

ocaisa avatar May 20 '25 14:05 ocaisa

@ocaisa Can you retarget this pr and see if anything should be moved to https://github.com/EESSI/software-layer-scripts?

laraPPr avatar Jun 27 '25 11:06 laraPPr

Retriggering CI

ocaisa avatar Jun 27 '25 11:06 ocaisa

Retriggering this to see where we are

ocaisa avatar Jul 31 '25 12:07 ocaisa

It needs this one to be merged first https://github.com/EESSI/software-layer-scripts/pull/23

laraPPr avatar Jul 31 '25 13:07 laraPPr

I'm not relying on easystacks for this CI, I'm directly comparing the module files/directories.

ocaisa avatar Jul 31 '25 13:07 ocaisa

Sorry for the confusion I thought this was a different pr.

laraPPr avatar Jul 31 '25 14:07 laraPPr

Just discussed with Alan: in principle this is ready to go once the CI passes. I.e. for that, we need the missing software for the missing archs to be deployed.

casparvl avatar Aug 05 '25 10:08 casparvl

https://github.com/EESSI/software-layer/pull/1147 should bring things in sync, so that the CI passes here. We may want to rebuild gromacs later to tell it to use NEON rather than SVE for certain ARM architectures.

casparvl avatar Aug 05 '25 14:08 casparvl

https://github.com/EESSI/software-layer-scripts/pull/59 is an intermediate PR, that at least installed CUDA/cuDNN accross the board. Let's retrigger CI, see if those differences indeed went away.

casparvl avatar Aug 19 '25 12:08 casparvl

Replaced by #1205 where it should get merged

ocaisa avatar Sep 26 '25 14:09 ocaisa