software-layer
software-layer copied to clipboard
{2023.06}[foss/2023a] TensorFlow v2.15.1 w/ CUDA 12.1.1
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/skylake_avx512,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi-hpc.org-2023.06-compat,eessi-hpc.org-2023.06-software,eessi.io-2023.06-software,eessi.io-2023.06-compat
Instance eessi-bot-mc-azure is configured to build for:
- architectures:
x86_64/amd/zen4 - repositories:
eessi-hpc.org-2023.06-software,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software,eessi.io-2023.06-compat
Instance boegel-bot-deucalion is configured to build for:
- architectures:
aarch64/a64fx - repositories:
eessi.io-2023.06-software
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80fromcasparvl- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80resulted in:- submitted job
20153, for details & status see https://github.com/EESSI/software-layer/pull/717#issuecomment-2378742619
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80fromcasparvl- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80resulted in:- no jobs were submitted
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
casparvlhas NO permission to send commands to the bot
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_717/20153
| date | job status | comment |
|---|---|---|
| Sep 27 08:31:27 UTC 2024 | submitted | job id 20153 awaits release by job manager |
| Sep 27 08:31:37 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Sep 27 08:36:39 UTC 2024 | running | job 20153 is running |
| Sep 27 16:44:00 UTC 2024 | finished | :cry: FAILURE (click triangle for details)
|
| Sep 27 16:44:00 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
Let's see how this goes. Note that we need a proper cuDNN deployment that strips the necessary files first... So we will need to rebuild in any case.
Let's see how this goes. Note that we need a proper
cuDNNdeployment that strips the necessary files first... So we will need to rebuild in any case.
I've marked this a draft, we definitely don't want to deploy with full cuDNN installation
The build succeeded, but many tests failed due to:
ImportError: libnccl.so.2: cannot open shared object file: No such file or directory
This is already available in the CPU-only stack, so I'm not sure why it didn't pick up the library from that module.
The build succeeded, but many tests failed due to:
ImportError: libnccl.so.2: cannot open shared object file: No such file or directoryThis is already available in the CPU-only stack, so I'm not sure why it didn't pick up the library from that module.
Just opened https://github.com/easybuilders/easybuild-easyblocks/pull/3497 which may fix the libnccl.so.2 error.
@casparvl Can you retarget this pr?