software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

{2023.06}[foss/2023a] PyTorch v2.1.2 w/ CUDA 12.1.1

Open trz42 opened this issue 1 year ago • 95 comments

Builds

magma/2.7.2-foss-2023a-CUDA-12.1.1
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1

Superseedes #718

trz42 avatar Nov 21 '24 20:11 trz42

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Nov 21 '24 20:11 eessi-bot[bot]

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

riscv-eessi-io-bot[bot] avatar Nov 21 '24 20:11 riscv-eessi-io-bot[bot]

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

riscv-eessi-io-bot[bot] avatar Nov 21 '24 20:11 riscv-eessi-io-bot[bot]

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Nov 21 '24 20:11 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

trz42 avatar Nov 21 '24 20:11 trz42

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 21 '24 20:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 21 '24 20:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • submitted job 30162, for details & status see https://github.com/EESSI/software-layer/pull/825#issuecomment-2492210018

eessi-bot[bot] avatar Nov 21 '24 20:11 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Nov 21 '24 20:11 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_825/30162

date job status comment
Nov 21 20:27:08 UTC 2024 submitted job id 30162 awaits release by job manager
Nov 21 20:27:13 UTC 2024 released job awaits launch by Slurm scheduler
Nov 21 20:28:17 UTC 2024 running job 30162 is running
Nov 21 22:32:31 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-30162.out
:white_check_mark: no message matching FATAL:
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1732226990.tar.gzsize: 302 MiB (316730812 bytes)
entries: 113
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
magma/2.7.2-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
magma/2.7.2-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Nov 21 22:32:31 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 433.113 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 443.086 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.85 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.52 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 9.31 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.44 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.33 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.31 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7756.73 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7744.15 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-30162.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
  • some /tmp/eb-m470fsqz/eb-r9zeygi0/tmpb1cr2ofk/rpath_wrappers/gxx_wrapper/g++ run failed with
    /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/ld: warning: libcupti.so.12, needed by lib/libtorch_cpu.so, not found (try using -rpath or -rpath-link)
    /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/ld: lib/libtorch_cpu.so: undefined reference to `[email protected]'
    /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/ld: lib/libtorch_cpu.so: undefined reference to `[email protected]'
    ...
    /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/bin/ld: lib/libtorch_cpu.so: undefined reference to `[email protected]'
    collect2: error: ld returned 1 exit status
    
  • we should be able to fix this by adding the directory that contains libcupti to $LIBRARY_PATH in a pre_configure hook (see https://github.com/NorESSI/software-layer/pull/369)

eessi-bot[bot] avatar Nov 21 '24 20:11 eessi-bot[bot]

Build again after applying fix to find libcupti...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

trz42 avatar Nov 22 '24 08:11 trz42

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 22 '24 08:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • submitted job 30339, for details & status see https://github.com/EESSI/software-layer/pull/825#issuecomment-2493133310

eessi-bot[bot] avatar Nov 22 '24 08:11 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Nov 22 '24 08:11 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_825/30339

date job status comment
Nov 22 08:13:18 UTC 2024 submitted job id 30339 awaits release by job manager
Nov 22 08:13:44 UTC 2024 released job awaits launch by Slurm scheduler
Nov 22 08:19:50 UTC 2024 running job 30339 is running
Nov 22 18:13:32 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-30339.out
:white_check_mark: no message matching FATAL:
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1732297886.tar.gzsize: 509 MiB (534700273 bytes)
entries: 12854
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
magma/2.7.2-foss-2023a-CUDA-12.1.1.lua
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
magma/2.7.2-foss-2023a-CUDA-12.1.1
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Nov 22 18:13:32 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 436.522 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 446.803 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.57 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.69 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.67 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 9.33 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.28 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.3 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7868.44 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7743.76 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-30339.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Nov 22 '24 08:11 eessi-bot[bot]

Also build for zen3

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

trz42 avatar Nov 22 '24 18:11 trz42

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 22 '24 18:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • submitted job 30343, for details & status see https://github.com/EESSI/software-layer/pull/825#issuecomment-2494484088

eessi-bot[bot] avatar Nov 22 '24 18:11 eessi-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 22 '24 18:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Nov 22 '24 18:11 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_825/30343

date job status comment
Nov 22 18:21:24 UTC 2024 submitted job id 30343 awaits release by job manager
Nov 22 18:21:35 UTC 2024 released job awaits launch by Slurm scheduler
Nov 22 18:27:37 UTC 2024 running job 30343 is running
Nov 23 02:24:10 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-30343.out
:white_check_mark: no message matching FATAL:
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1732327591.tar.gzsize: 509 MiB (534708669 bytes)
entries: 12854
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
magma/2.7.2-foss-2023a-CUDA-12.1.1.lua
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
magma/2.7.2-foss-2023a-CUDA-12.1.1
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Nov 23 02:24:10 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 522.848 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen3-node+default
P: perf: 532.462 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.42 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 2.32 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.5 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 5.43 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.24 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen3-node+default
P: latency: 0.22 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14283.69 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen3-node+default
P: bandwidth: 14294.01 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-30343.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Nov 22 '24 18:11 eessi-bot[bot]

Try a different approach where we rebuild the CUDA module such that it prepends the directory containing the libcupti library to LIBRARY_PATH and then not using the hook used in the previous builds...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

trz42 avatar Nov 23 '24 10:11 trz42

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • submitted job 30521, for details & status see https://github.com/EESSI/software-layer/pull/825#issuecomment-2495433367

eessi-bot[bot] avatar Nov 23 '24 10:11 eessi-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 23 '24 10:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Nov 23 '24 10:11 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_825/30521

date job status comment
Nov 23 10:37:14 UTC 2024 submitted job id 30521 awaits release by job manager
Nov 23 10:37:46 UTC 2024 released job awaits launch by Slurm scheduler
Nov 23 10:42:48 UTC 2024 running job 30521 is running
Nov 23 12:52:40 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-30521.out
:white_check_mark: no message matching FATAL:
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1732364988.tar.gzsize: 302 MiB (316748222 bytes)
entries: 114
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
magma/2.7.2-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
magma/2.7.2-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Nov 23 12:52:40 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 436.344 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-amd-zen2-node+default
P: perf: 448.257 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.46 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 4.5 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.35 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 8.52 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.33 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-amd-zen2-node+default
P: latency: 0.35 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7919.73 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-amd-zen2-node+default
P: bandwidth: 7918.9 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-30521.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Nov 23 '24 10:11 eessi-bot[bot]

Use force to rebuild CUDA...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

trz42 avatar Nov 23 '24 11:11 trz42

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 23 '24 11:11 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from trz42

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • submitted job 30522, for details & status see https://github.com/EESSI/software-layer/pull/825#issuecomment-2495441487

eessi-bot[bot] avatar Nov 23 '24 11:11 eessi-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account trz42 has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Nov 23 '24 11:11 riscv-eessi-io-bot[bot]