software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Allow Nvidia driver script to set LD_PRELOAD

Open ocaisa opened this issue 1 year ago • 7 comments

ocaisa avatar Sep 27 '24 11:09 ocaisa

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Sep 27 '24 11:09 eessi-bot[bot]

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

eessi-bot[bot] avatar Sep 27 '24 11:09 eessi-bot[bot]

Example output:

[rocky@ip-172-31-27-81 software-layer]$  ./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh --ld-preload --no-download
Found NVIDIA GPU driver version 545.23.08
Found host CUDA version 12.3
Using default list of libraries
Matched 48 CUDA Libraries

When attempting to use LD_PRELOAD we exclude anything related to graphics
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGL.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGL.so.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX_nvidia.so.0.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX.so.0.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libGLX.so.
libwayland-server.so.0 is NOT in the provided  preload list, filtering /lib64/libnvidia-egl-wayland.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libnvidia-fbc.so.1.
libXext.so.6 is NOT in the provided  preload list, filtering /lib64/libnvidia-fbc.so.
libXNVCtrl.so.0 is NOT in the provided  preload list, filtering /lib64/libnvidia-gtk3.so.545.23.08.

The recommended way to use LD_PRELOAD is to only use it when you need to:

export EESSI_GPU_LD_PRELOAD="/lib64/libcuda.so.1:/lib64/libcuda.so:/lib64/libcudadebugger.so.1:/lib64/libnvcuvid.so.1:/lib64/libnvcuvid.so:/lib64/libnvidia-cfg.so.1:/lib64/libnvidia-cfg.so:/lib64/libnvidia-eglcore.so.545.23.08:/lib64/libnvidia-encode.so.1:/lib64/libnvidia-encode.so:/lib64/libnvidia-glcore.so.545.23.08:/lib64/libnvidia-glsi.so.545.23.08:/lib64/libnvidia-glvkspirv.so.545.23.08:/lib64/libnvidia-gpucomp.so.545.23.08:/lib64/libnvidia-ml.so.1:/lib64/libnvidia-ml.so:/lib64/libnvidia-nvvm.so.4:/lib64/libnvidia-nvvm.so:/lib64/libnvidia-opencl.so.1:/lib64/libnvidia-opticalflow.so.1:/lib64/libnvidia-ptxjitcompiler.so.1:/lib64/libnvidia-ptxjitcompiler.so:/lib64/libnvidia-rtcore.so.545.23.08:/lib64/libnvidia-tls.so.545.23.08:/lib64/libnvoptix.so.1:/lib64/libOpenCL.so.1"
export EESSI_OVERRIDE_GPU_CHECK="1"

Then you can set LD_PRELOAD only when you want to run a GPU application, e.g.,
    LD_PRELOAD="$EESSI_GPU_LD_PRELOAD" device_query

ocaisa avatar Sep 27 '24 16:09 ocaisa

@ocaisa There's duplicate entries here, libcuda.so is a symlink for libcuda.so.1, only one is needed

boegel avatar Oct 09 '24 11:10 boegel

This is resulting in about 400MB of preload:

{EESSI 2023.06} [rocky@ip-172-31-20-85 software-layer]$ IFS=':'; for path in $EESSI_GPU_LD_PRELOAD; do ls -lh $path; done; unset IFS
-rwxr-xr-x 1 root root 29M Nov  6  2023 /usr/lib64/libcuda.so.545.23.08
-rwxr-xr-x 1 root root 11M Nov  6  2023 /usr/lib64/libcudadebugger.so.545.23.08
-rwxr-xr-x 1 root root 9.6M Nov  6  2023 /usr/lib64/libnvcuvid.so.545.23.08
-rwxr-xr-x 1 root root 269K Nov  6  2023 /usr/lib64/libnvidia-cfg.so.545.23.08
-rwxr-xr-x 1 root root 566K Nov  6  2023 /usr/lib64/libnvidia-glsi.so.545.23.08
-rwxr-xr-x 1 root root 8.7M Nov  6  2023 /usr/lib64/libnvidia-glvkspirv.so.545.23.08
-rwxr-xr-x 1 root root 42M Nov  7  2023 /usr/lib64/libnvidia-gpucomp.so.545.23.08
-rwxr-xr-x 1 root root 1.9M Nov  6  2023 /usr/lib64/libnvidia-ml.so.545.23.08
-rwxr-xr-x 1 root root 83M Nov  7  2023 /usr/lib64/libnvidia-nvvm.so.545.23.08
-rwxr-xr-x 1 root root 24M Nov  6  2023 /usr/lib64/libnvidia-opencl.so.545.23.08
-rwxr-xr-x 1 root root 26M Nov  6  2023 /usr/lib64/libnvidia-ptxjitcompiler.so.545.23.08
-rwxr-xr-x 1 root root 103M Nov  7  2023 /usr/lib64/libnvidia-rtcore.so.545.23.08
-rwxr-xr-x 1 root root 19K Nov  6  2023 /usr/lib64/libnvidia-tls.so.545.23.08
-rwxr-xr-x 1 root root 58M Nov  7  2023 /usr/lib64/libnvoptix.so.545.23.08
-rwxr-xr-x 1 root root 131K Apr 12  2021 /usr/lib64/libOpenCL.so.1.0.0

ocaisa avatar Oct 09 '24 18:10 ocaisa

@boegel I've played with this a lot today and I'm happy with the functionality now:

{EESSI 2023.06} [rocky@ip-172-31-20-85 software-layer]$ ./scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh --no-download --ld-preload
Found host CUDA version 7.5
Found NVIDIA GPU driver version 545.23.08
Using default list of libraries
Matched 48 CUDA Libraries

When attempting to use LD_PRELOAD we exclude anything related to graphics
Match found for libcuda.so for CUDA compat libraries
Match found for libcudadebugger.so for CUDA compat libraries
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libEGL.so.1
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libEGL.so
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libGLESv1_CM.so.1
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libGLESv1_CM.so
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libGLESv2.so.2
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libGLESv2.so
libGLX.so.0 is NOT in the provided preload list, filtering /lib64/libGL.so.1
libGLX.so.0 is NOT in the provided preload list, filtering /lib64/libGL.so
libXext.so.6 is NOT in the provided preload list, filtering /lib64/libGLX_nvidia.so.0
libXext.so.6 is NOT in the provided preload list, filtering /lib64/libGLX.so.0
libXext.so.6 is NOT in the provided preload list, filtering /lib64/libGLX.so
libwayland-server.so.0 is NOT in the provided preload list, filtering /lib64/libnvidia-egl-wayland.so.1
libnvcuvid.so.1 is NOT in the provided preload list, filtering /lib64/libnvidia-encode.so.1
libnvcuvid.so.1 is NOT in the provided preload list, filtering /lib64/libnvidia-encode.so
libGL.so.1 is NOT in the provided preload list, filtering /lib64/libnvidia-fbc.so.1
libGL.so.1 is NOT in the provided preload list, filtering /lib64/libnvidia-fbc.so
libXNVCtrl.so.0 is NOT in the provided preload list, filtering /lib64/libnvidia-gtk3.so.545.23.08
Match found for libnvidia-nvvm.so for CUDA compat libraries
libnvcuvid.so.1 is NOT in the provided preload list, filtering /lib64/libnvidia-opticalflow.so.1
Match found for libnvidia-ptxjitcompiler.so for CUDA compat libraries
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libOpenGL.so.0
libGLdispatch.so.0 is NOT in the provided preload list, filtering /lib64/libOpenGL.so

The recommended way to use LD_PRELOAD is to only use it when you need to.

A minimal preload which should work in most cases:
export EESSI_GPU_COMPAT_LD_PRELOAD="/usr/lib64/libcuda.so.545.23.08:/usr/lib64/libcudadebugger.so.545.23.08:/usr/lib64/libnvidia-nvvm.so.545.23.08:/usr/lib64/libnvidia-ptxjitcompiler.so.545.23.08"

A corner-case full preload (which is hard on memory) for exceptional use:
export EESSI_GPU_LD_PRELOAD="/usr/lib64/libcuda.so.545.23.08:/usr/lib64/libcudadebugger.so.545.23.08:/usr/lib64/libEGL_nvidia.so.545.23.08:/usr/lib64/libGLdispatch.so.0.0.0:/usr/lib64/libGLESv1_CM_nvidia.so.545.23.08:/usr/lib64/libGLESv2_nvidia.so.545.23.08:/usr/lib64/libnvcuvid.so.545.23.08:/usr/lib64/libnvidia-cfg.so.545.23.08:/usr/lib64/libnvidia-eglcore.so.545.23.08:/usr/lib64/libnvidia-glcore.so.545.23.08:/usr/lib64/libnvidia-glsi.so.545.23.08:/usr/lib64/libnvidia-glvkspirv.so.545.23.08:/usr/lib64/libnvidia-gpucomp.so.545.23.08:/usr/lib64/libnvidia-ml.so.545.23.08:/usr/lib64/libnvidia-nvvm.so.545.23.08:/usr/lib64/libnvidia-opencl.so.545.23.08:/usr/lib64/libnvidia-ptxjitcompiler.so.545.23.08:/usr/lib64/libnvidia-rtcore.so.545.23.08:/usr/lib64/libnvidia-tls.so.545.23.08:/usr/lib64/libnvoptix.so.545.23.08:/usr/lib64/libOpenCL.so.1.0.0"
export EESSI_OVERRIDE_GPU_CHECK="1"

Then you can set LD_PRELOAD only when you want to run a GPU application, e.g.,
    LD_PRELOAD="$EESSI_GPU_COMPAT_LD_PRELOAD" device_query

ocaisa avatar Oct 10 '24 10:10 ocaisa

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

ocaisa avatar Oct 17 '24 12:10 ocaisa

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • submitted job 23806, for details & status see https://github.com/EESSI/software-layer/pull/754#issuecomment-2419419685

eessi-bot[bot] avatar Oct 17 '24 12:10 eessi-bot[bot]

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account ocaisa has NO permission to send commands to the bot

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Oct 17 '24 12:10 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_754/23806

date job status comment
Oct 17 12:33:27 UTC 2024 submitted job id 23806 awaits release by job manager
Oct 17 12:33:30 UTC 2024 released job awaits launch by Slurm scheduler
Oct 17 12:34:35 UTC 2024 running job 23806 is running
Oct 17 12:40:51 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-23806.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1729168457.tar.gzsize: 0 MiB (4682 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Oct 17 12:40:51 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-generic-node+default
P: perf: 484.052 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-generic-node+default
P: perf: 507.606 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.5 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.3 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-generic-node+default
P: latency: 7.98 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-generic-node+default
P: latency: 7.91 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.62 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.64 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10600.45 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10212.21 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-23806.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Oct 17 '24 12:10 eessi-bot[bot]

Also tested the script within eessi_container :

Found host CUDA version 9.0
Found NVIDIA GPU driver version 535.129.03
Using downloaded list of libraries
Matched 41 CUDA Libraries
The host GPU driver libraries (v535.129.03) have already been linked! (based on /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/host/driver_version.txt)
Successfully created symlink between /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/latest and lib in /cvmfs/software.eessi.io/host_injections/2023.06/compat/linux/aarch64
Host NVIDIA GPU drivers linked successfully for EESSI

TopRichard avatar Nov 05 '24 10:11 TopRichard

@TopRichard This will need to be re-tested now to make sure the changes haven't had an unintended impact

ocaisa avatar Nov 07 '24 09:11 ocaisa

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

ocaisa avatar Jan 16 '25 13:01 ocaisa

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • submitted job 40795, for details & status see https://github.com/EESSI/software-layer/pull/754#issuecomment-2595764166

eessi-bot[bot] avatar Jan 16 '25 13:01 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jan 16 '25 13:01 eessi-bot[bot]

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

gpu-bot-ugent[bot] avatar Jan 16 '25 13:01 gpu-bot-ugent[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.01/pr_754/40795

date job status comment
Jan 16 13:57:47 UTC 2025 submitted job id 40795 awaits release by job manager
Jan 16 13:57:55 UTC 2025 released job awaits launch by Slurm scheduler
Jan 16 14:02:58 UTC 2025 running job 40795 is running
Jan 16 14:10:06 UTC 2025 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-40795.out
:white_check_mark: no message matching FATAL:
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1737036231.tar.gzsize: 0 MiB (4715 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Jan 16 14:10:06 UTC 2025 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86-64-generic-node+default
P: perf: 452.106 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86-64-generic-node+default
P: perf: 464.798 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.01 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.14 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-generic-node+default
P: latency: 7.81 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-generic-node+default
P: latency: 7.72 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.62 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.64 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10317.54 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 10306.94 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-40795.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
Jan 17 10:59:44 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1737036231.tar.gz to S3 bucket succeeded

eessi-bot[bot] avatar Jan 16 '25 13:01 eessi-bot[bot]

@TopRichard This will need to be re-tested now to make sure the changes haven't had an unintended impact

re-testing:

Apptainer> /cvmfs/software.eessi.io/versions/2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Found host CUDA version 9.0
Found NVIDIA GPU driver version 535.129.03
Using downloaded list of libraries
Matched 41 CUDA Libraries
Successfully created symlink between latest and host in /cvmfs/software.eessi.io/host_injections/nvidia/aarch64
Successfully created symlink between /cvmfs/software.eessi.io/host_injections/nvidia/aarch64/latest and lib in /cvmfs/software.eessi.io/host_injections/2023.06/compat/linux/aarch64
Host NVIDIA GPU drivers linked successfully for EESSI

TopRichard avatar Jan 17 '25 10:01 TopRichard

@bedroge This was deployed, so PR should be merged too?

boegel avatar Jan 17 '25 15:01 boegel

@bedroge This was deployed, so PR should be merged too?

Yes, the tarball has been ingested.

bedroge avatar Jan 17 '25 15:01 bedroge

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.10/pr_754/23806', '/project/def-users/SHARED/jobs/2025.01/pr_754/40795'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.17

eessi-bot[bot] avatar Jan 17 '25 16:01 eessi-bot[bot]

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.17

eessi-bot[bot] avatar Jan 17 '25 16:01 eessi-bot[bot]