software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Put a Lmod-relevant wrapper in place for archdetect accelerator detection

Open ocaisa opened this issue 1 year ago • 8 comments

ocaisa avatar Oct 11 '24 16:10 ocaisa

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Oct 11 '24 16:10 eessi-bot[bot]

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

eessi-bot[bot] avatar Oct 11 '24 16:10 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/generic

ocaisa avatar Oct 11 '24 16:10 ocaisa

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • submitted job 22672, for details & status see https://github.com/EESSI/software-layer/pull/783#issuecomment-2407777392

eessi-bot[bot] avatar Oct 11 '24 16:10 eessi-bot[bot]

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account ocaisa has NO permission to send commands to the bot

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/generic from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/generic resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Oct 11 '24 16:10 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_783/22672

date job status comment
Oct 11 16:44:21 UTC 2024 submitted job id 22672 awaits release by job manager
Oct 11 16:45:07 UTC 2024 released job awaits launch by Slurm scheduler
Oct 11 16:51:10 UTC 2024 running job 22672 is running
Oct 11 16:58:16 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-22672.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1728665496.tar.gzsize: 0 MiB (286 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/init/lmod_eessi_archdetect_wrapper_accel.sh
Oct 11 16:58:16 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos /aeb2d9df @BotBuildTests:x86-64-generic-node+default
P: perf: 488.052 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %scale=1_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /04ff9ece @BotBuildTests:x86-64-generic-node+default
P: perf: 507.982 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /31ac6ab9 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.14 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_allreduce %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /f3be40a2 @BotBuildTests:x86-64-generic-node+default
P: latency: 5.23 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /10e66fba @BotBuildTests:x86-64-generic-node+default
P: latency: 7.89 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_Micro_Benchmarks_coll %benchmark_info=mpi.collective.osu_alltoall %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /5be57ae7 @BotBuildTests:x86-64-generic-node+default
P: latency: 8.87 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /c8c9aff5 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.71 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_latency %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /9795e491 @BotBuildTests:x86-64-generic-node+default
P: latency: 0.7 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %device_type=cpu /48da21c5 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 11028.58 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_Micro_Benchmarks_pt2pt %benchmark_info=mpi.pt2pt.osu_bw %scale=1_node %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %device_type=cpu /1b8c1ca2 @BotBuildTests:x86-64-generic-node+default
P: bandwidth: 11072.29 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-22672.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
Oct 15 07:33:05 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1728665496.tar.gz to S3 bucket succeeded

eessi-bot[bot] avatar Oct 11 '24 16:10 eessi-bot[bot]

Label bot:deploy has been set by user trz42, but this person does not have permission to trigger deployments

This is part of #781 , where I have a chicken egg situation (I need the script in place so the module can be tested in CI). The wrapper is required so I only take absolutely minimal changes from the called script (in my case I just want the value of result, no error codes or anything)

ocaisa avatar Oct 15 '24 07:10 ocaisa

Script has become available via CernVM-FS:

$ cat /cvmfs/software.eessi.io/versions/2023.06/init/lmod_eessi_archdetect_wrapper_accel.sh
# This can be leveraged by the source_sh() feature of Lmod
export EESSI_ACCEL_SUBDIR=$($(dirname $(readlink -f $BASH_SOURCE))/eessi_archdetect.sh accelpath)

trz42 avatar Oct 15 '24 08:10 trz42

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.10/pr_783/22672'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.10.15

eessi-bot[bot] avatar Oct 15 '24 08:10 eessi-bot[bot]

PR merged! Moved [] to /home/kehoste/project_dir/bot/trash-bin #$HOME/trash_bin/EESSI/software-layer/2024.10.15

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.10.15

eessi-bot[bot] avatar Oct 15 '24 08:10 eessi-bot[bot]