software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc

Open adammccartney opened this issue 7 months ago • 15 comments

This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.

The content of the hook is extracted from: https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547

adammccartney avatar Apr 24 '25 06:04 adammccartney

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Apr 24 '25 06:04 eessi-bot[bot]

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

gpu-bot-ugent[bot] avatar Apr 24 '25 06:04 gpu-bot-ugent[bot]

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

eessi-bot-surf[bot] avatar Apr 24 '25 06:04 eessi-bot-surf[bot]

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Apr 24 '25 08:04 eessi-bot[bot]

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

hvelab avatar Apr 29 '25 07:04 hvelab

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see. I haven't had the time to look into ReFrame at all yet, so there are no tests (yet) apart from the sanity checks in the EasyBuild. The build last week was done in the eessi-container from the "software layer" repo, which was slightly adapted to suit our own build environment. The build command looks like the following:

#!/bin/bash

project_root="$(realpath $(dirname $(dirname $(dirname $(dirname $BASH_SOURCE)))))"

eb "${project_root}/easyconfigs/2025/NVHPC-25.1-CUDA-12.6.0.eb" \
    -r --cuda-compute-capabilities=9.0 \
    --configfiles="${project_root}/easybuild-asc-config/2025/config.cfg" \
    --hooks="${project_root}/easybuild-asc-config/2025/eb_hooks.py"

As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by eessi_container.sh ?

adammccartney avatar Apr 29 '25 08:04 adammccartney

So, interestingly the sanity check now fails if I try to build this directly on a compute node (x86_64/amd/zen4 is the architecture by the way) . The initial build was done in the eessi container as I mentioned, set up to use the EESSI_PROJECT_INSTALL variable pointing at a writeable /cvmfs/software.asc.ac.at directory. The build now fails when I try to use EESSI_SITE_INSTALL. Maybe there is something leaking in via the ld cache on the host as was previously observed. Makes me wonder about how usable the compiler is if we load it from the custom cvmfs repo...

adammccartney avatar Apr 29 '25 12:04 adammccartney

bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90

adammccartney avatar Apr 29 '25 16:04 adammccartney

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • account adammccartney has NO permission to send commands to the bot

eessi-bot[bot] avatar Apr 29 '25 16:04 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • account adammccartney has NO permission to send commands to the bot

eessi-bot[bot] avatar Apr 29 '25 16:04 eessi-bot[bot]

Updates by the bot instance eessi-bot-surf (click for details)
  • account adammccartney has NO permission to send commands to the bot

eessi-bot-surf[bot] avatar Apr 29 '25 16:04 eessi-bot-surf[bot]

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • account adammccartney has NO permission to send commands to the bot

gpu-bot-ugent[bot] avatar Apr 29 '25 16:04 gpu-bot-ugent[bot]

Okay, so I got this working with some careful attention to what the linker was up to. See the commit message https://github.com/EESSI/software-layer/pull/1043/commits/a2fe8bec8e70561e34e92273b696632fdb30f5d5 For a bit more info.

adammccartney avatar May 05 '25 14:05 adammccartney

@adammccartney Thank you for your contribution. We have recently split up the software-layer reposotory. The changes that are made in this pr should target the new repository, https://github.com/EESSI/software-layer-scripts. Which is why will close this pr.

We are at the moment also reworking how we handle NVHPC upstream in EasyBuild. This is also why we recommend to hold of on impleting this until we have finished that work in EasyBuild.

laraPPr avatar Jul 01 '25 10:07 laraPPr