software-layer
software-layer copied to clipboard
{2023.06}[system] cuDNN/8.9.2.26-CUDA-12.1.1
requires:
- #720
Attempt to add cuDNN which is a dependency of other packages such as TensorFlow and PyTorch.
Major additions/changes:
scripts/gpu_support/nvidia/install_cuda_and_libraries.shwithscripts/gpu_support/nvidia/eessi-2023.06-cuda-and-libraries.yml- script to install
CUDAandcuDNNpackages under.../host_injections
- script to install
EESSI-install-software.sh- use
scripts/gpu_support/nvidia/install_cuda_and_libraries.shwithscripts/gpu_support/nvidia/eessi-2023.06-cuda-and-libraries.ymlto installCUDA,cuDNNunder.../host_injections
- use
eb_hooks.py- put code that iterates over all files replacing non-distributable ones with
symlinks into
host_injectionswith a common function (replace_non_distributable_files_with_symlinks) - additional
post_sanitycheck_hookwhich replaces files with symlinks into corresponding paths under.../host_injectionsfor all files that cannot be redistributed - dropping dependency on
cuDNNto a build dependency (seeinject_gpu_property)
- put code that iterates over all files replacing non-distributable ones with
symlinks into
create_lmodsitepackage.py- consolidate
eessi_{cuda,cudnn}_enabled_load_hookfunctions in a single one (eessi_cuda_and_libraries_enabled_load_hook) - the remaining hook is prepared to easily add new modules, e.g., cuTENSOR
- consolidate
install_scripts.sh- add files to copy to CVMFS (see
nvidia_files)
- add files to copy to CVMFS (see
Instance eessi-bot-mc-aws is configured to build:
- arch
x86_64/genericfor repoeessi-hpc.org-2023.06-compat - arch
x86_64/genericfor repoeessi-hpc.org-2023.06-software - arch
x86_64/genericfor repoeessi.io-2023.06-compat - arch
x86_64/genericfor repoeessi.io-2023.06-software - arch
x86_64/intel/haswellfor repoeessi-hpc.org-2023.06-compat - arch
x86_64/intel/haswellfor repoeessi-hpc.org-2023.06-software - arch
x86_64/intel/haswellfor repoeessi.io-2023.06-compat - arch
x86_64/intel/haswellfor repoeessi.io-2023.06-software - arch
x86_64/intel/skylake_avx512for repoeessi-hpc.org-2023.06-compat - arch
x86_64/intel/skylake_avx512for repoeessi-hpc.org-2023.06-software - arch
x86_64/intel/skylake_avx512for repoeessi.io-2023.06-compat - arch
x86_64/intel/skylake_avx512for repoeessi.io-2023.06-software - arch
x86_64/amd/zen2for repoeessi-hpc.org-2023.06-compat - arch
x86_64/amd/zen2for repoeessi-hpc.org-2023.06-software - arch
x86_64/amd/zen2for repoeessi.io-2023.06-compat - arch
x86_64/amd/zen2for repoeessi.io-2023.06-software - arch
x86_64/amd/zen3for repoeessi-hpc.org-2023.06-compat - arch
x86_64/amd/zen3for repoeessi-hpc.org-2023.06-software - arch
x86_64/amd/zen3for repoeessi.io-2023.06-compat - arch
x86_64/amd/zen3for repoeessi.io-2023.06-software - arch
aarch64/genericfor repoeessi-hpc.org-2023.06-compat - arch
aarch64/genericfor repoeessi-hpc.org-2023.06-software - arch
aarch64/genericfor repoeessi.io-2023.06-compat - arch
aarch64/genericfor repoeessi.io-2023.06-software - arch
aarch64/neoverse_n1for repoeessi-hpc.org-2023.06-compat - arch
aarch64/neoverse_n1for repoeessi-hpc.org-2023.06-software - arch
aarch64/neoverse_n1for repoeessi.io-2023.06-compat - arch
aarch64/neoverse_n1for repoeessi.io-2023.06-software - arch
aarch64/neoverse_v1for repoeessi-hpc.org-2023.06-compat - arch
aarch64/neoverse_v1for repoeessi-hpc.org-2023.06-software - arch
aarch64/neoverse_v1for repoeessi.io-2023.06-compat - arch
aarch64/neoverse_v1for repoeessi.io-2023.06-software
Instance eessi-bot-mc-azure is configured to build:
- arch
x86_64/amd/zen4for repoeessi-hpc.org-2023.06-compat - arch
x86_64/amd/zen4for repoeessi-hpc.org-2023.06-software - arch
x86_64/amd/zen4for repoeessi.io-2023.06-compat - arch
x86_64/amd/zen4for repoeessi.io-2023.06-software
bot: build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- submitted job
10940, for details & status see https://github.com/EESSI/software-layer/pull/581#issuecomment-2117129261
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_581/10940
| date | job status | comment |
|---|---|---|
| May 17 09:26:27 UTC 2024 | submitted | job id 10940 awaits release by job manager |
| May 17 09:27:22 UTC 2024 | released | job awaits launch by Slurm scheduler |
| May 17 09:32:24 UTC 2024 | running | job 10940 is running |
| May 17 09:40:32 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
| May 17 09:40:32 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Retry after fixing args to cuDNN install script...
bot: build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- submitted job
10941, for details & status see https://github.com/EESSI/software-layer/pull/581#issuecomment-2117292658
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_581/10941
| date | job status | comment |
|---|---|---|
| May 17 10:45:01 UTC 2024 | submitted | job id 10941 awaits release by job manager |
| May 17 10:45:40 UTC 2024 | released | job awaits launch by Slurm scheduler |
| May 17 10:49:42 UTC 2024 | running | job 10941 is running |
| May 17 10:59:52 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| May 17 10:59:52 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
@trz42 The installation looks suspiciously large at 700MB, are you sure your hook is cleaning out the files it should?
@trz42 The installation looks suspiciously large at 700MB, are you sure your hook is cleaning out the files it should?
Full package is 1.4 GB.
Rebuild after changing hook function that handles dependencies and creates modluafooter entries...
bot: build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- submitted job
10942, for details & status see https://github.com/EESSI/software-layer/pull/581#issuecomment-2117540885
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_581/10942
| date | job status | comment |
|---|---|---|
| May 17 12:54:38 UTC 2024 | submitted | job id 10942 awaits release by job manager |
| May 17 12:55:03 UTC 2024 | released | job awaits launch by Slurm scheduler |
| May 17 13:00:06 UTC 2024 | running | job 10942 is running |
| May 17 13:05:11 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
| May 17 13:05:11 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
One more time...
bot: build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- submitted job
10943, for details & status see https://github.com/EESSI/software-layer/pull/581#issuecomment-2117581012
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_581/10943
| date | job status | comment |
|---|---|---|
| May 17 13:14:32 UTC 2024 | submitted | job id 10943 awaits release by job manager |
| May 17 13:15:15 UTC 2024 | released | job awaits launch by Slurm scheduler |
| May 17 13:16:17 UTC 2024 | running | job 10943 is running |
| May 17 13:24:26 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| May 17 13:24:26 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
@trz42 I will take your updated host_injections script for a test drive tomorrow, I think I have a few suggestions there and will open a PR to your branch
I also get the feeling that if we are going to move to easystack files (a good idea) then we should probably ship the ones we expect people to use
@trz42 I will take your updated
host_injectionsscript for a test drive tomorrow, I think I have a few suggestions there and will open a PR to your branch
Just updated the script with some improvements/fixes after my own testing.
Run another build after several changes...
bot: build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- submitted job
11284, for details & status see https://github.com/EESSI/software-layer/pull/581#issuecomment-2126650177
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build inst:aws repo:eessi.io-2023.06-software arch:x86_64/amd/zen2fromtrz42- expanded format:
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2
- expanded format:
-
handling command
build instance:aws repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2resulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_581/11284
| date | job status | comment |
|---|---|---|
| May 23 09:28:36 UTC 2024 | submitted | job id 11284 awaits release by job manager |
| May 23 09:29:06 UTC 2024 | released | job awaits launch by Slurm scheduler |
| May 23 09:30:09 UTC 2024 | running | job 11284 is running |
| May 23 09:42:29 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| May 23 09:42:29 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
@trz42 Can we close this now?
The has been reimplemented in #772 and #798 so closing this PR (if I'm wrong @trz42 can reopen it)
PR merged! Moved ['/project/def-users/SHARED/jobs/2024.05/pr_581/10940', '/project/def-users/SHARED/jobs/2024.05/pr_581/10941', '/project/def-users/SHARED/jobs/2024.05/pr_581/10942', '/project/def-users/SHARED/jobs/2024.05/pr_581/10943', '/project/def-users/SHARED/jobs/2024.05/pr_581/11284'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.11.07