software-layer
software-layer copied to clipboard
{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc
This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.
The content of the hook is extracted from: https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547
Instance eessi-bot-deucalion is configured to build for:
- architectures:
aarch64/a64fx - repositories:
eessi.io-2023.06-software
Instance eessi-bot-mc-azure is configured to build for:
- architectures:
x86_64/amd/zen4 - repositories:
eessi.io-2023.06-compat,eessi.io-2023.06-software
Instance eessi-bot-vsc-ugent is configured to build for:
- architectures:
x86_64/amd/zen3 - repositories:
eessi-hpc.org-2023.06-software,eessi.io-2023.06-compat,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software
Instance rt-Grace-jr is configured to build for:
- architectures:
aarch64/nvidia/grace - repositories:
eessi.io-2023.06-software
Instance eessi-bot-surf is configured to build for:
- architectures:
x86_64/amd/zen4,x86_64/amd/zen2 - repositories:
eessi-hpc.org-2023.06-software,eessi.io-2023.06-software,eessi.io-2023.06-compat,eessi-hpc.org-2023.06-compat
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/sapphirerapids,x86_64/intel/skylake_avx512,x86_64/intel/cascadelake,x86_64/intel/icelake,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi.io-2023.06-compat,eessi.io-2023.06-software
Hi @adammccartney ,
As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?
Thank you!
Hi @adammccartney ,
As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?
Thank you!
Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see. I haven't had the time to look into ReFrame at all yet, so there are no tests (yet) apart from the sanity checks in the EasyBuild. The build last week was done in the eessi-container from the "software layer" repo, which was slightly adapted to suit our own build environment. The build command looks like the following:
#!/bin/bash
project_root="$(realpath $(dirname $(dirname $(dirname $(dirname $BASH_SOURCE)))))"
eb "${project_root}/easyconfigs/2025/NVHPC-25.1-CUDA-12.6.0.eb" \
-r --cuda-compute-capabilities=9.0 \
--configfiles="${project_root}/easybuild-asc-config/2025/config.cfg" \
--hooks="${project_root}/easybuild-asc-config/2025/eb_hooks.py"
As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by eessi_container.sh ?
So, interestingly the sanity check now fails if I try to build this directly on a compute node (x86_64/amd/zen4 is the architecture by the way) . The initial build was done in the eessi container as I mentioned, set up to use the EESSI_PROJECT_INSTALL variable pointing at a writeable /cvmfs/software.asc.ac.at directory.
The build now fails when I try to use EESSI_SITE_INSTALL. Maybe there is something leaking in via the ld cache on the host as was previously observed. Makes me wonder about how usable the compiler is if we load it from the custom cvmfs repo...
bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90
Updates by the bot instance eessi-bot-mc-aws
(click for details)
- account
adammccartneyhas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-mc-azure
(click for details)
- account
adammccartneyhas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-surf
(click for details)
- account
adammccartneyhas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-vsc-ugent
(click for details)
- account
adammccartneyhas NO permission to send commands to the bot
Okay, so I got this working with some careful attention to what the linker was up to. See the commit message https://github.com/EESSI/software-layer/pull/1043/commits/a2fe8bec8e70561e34e92273b696632fdb30f5d5 For a bit more info.
@adammccartney Thank you for your contribution. We have recently split up the software-layer reposotory. The changes that are made in this pr should target the new repository, https://github.com/EESSI/software-layer-scripts. Which is why will close this pr.
We are at the moment also reworking how we handle NVHPC upstream in EasyBuild. This is also why we recommend to hold of on impleting this until we have finished that work in EasyBuild.