software-layer
software-layer copied to clipboard
Test if non CUDA builds are not added to accelorator path with jax
5 out of 86 required modules missing:
* absl-py/2.1.0-GCCcore-12.3.0 (absl-py-2.1.0-GCCcore-12.3.0.eb)
* pytest/7.4.2-GCCcore-12.3.0 (pytest-7.4.2-GCCcore-12.3.0.eb)
* pytest-xdist/3.3.1-GCCcore-12.3.0 (pytest-xdist-3.3.1-GCCcore-12.3.0.eb)
* ml_dtypes/0.3.2-gfbf-2023a (ml_dtypes-0.3.2-gfbf-2023a.eb)
* jax/0.4.25-gfbf-2023a-CUDA-12.1.1 (jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb)
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/sapphire_rapids,x86_64/intel/skylake_avx512,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi.io-2023.06-software,eessi.io-2023.06-compat
Instance eessi-bot-mc-azure is configured to build for:
- architectures:
x86_64/amd/zen4 - repositories:
eessi.io-2023.06-software,eessi.io-2023.06-compat
Instance eessi-bot-casparvl is configured to build for:
- architectures:
x86_64/amd/zen4,x86_64/amd/zen2 - repositories:
eessi.io-2023.06-software,eessi-hpc.org-2023.06-compat,eessi-hpc.org-2023.06-software,eessi.io-2023.06-compat
Instance eessi-bot-vsc-ugent is configured to build for:
- architectures:
x86_64/amd/zen3 - repositories:
eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software,eessi-hpc.org-2023.06-software,eessi.io-2023.06-compat
Instance trz42-GH200-jr is configured to build for:
- architectures:
aarch64/nvidia/grace - repositories:
eessi.io-2023.06-software
bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80fromlaraPPr- expanded format:
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
- expanded format:
-
handling command
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80resulted in:- no jobs were submitted
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80fromlaraPPr- expanded format:
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
- expanded format:
-
handling command
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80resulted in:- no jobs were submitted
Updates by the bot instance eessi-bot-casparvl
(click for details)
- account
laraPPrhas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-vsc-ugent
(click for details)
-
received bot command
build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80fromlaraPPr- expanded format:
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
- expanded format:
-
handling command
build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80resulted in:- submitted job
15445297, for details & status see https://github.com/EESSI/software-layer/pull/917#issuecomment-2656598059
- submitted job
Updates by the bot instance trz42-GH200-jr
(click for details)
- account
laraPPrhas NO permission to send commands to the bot
New job on instance eessi-bot-vsc-ugent for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /scratch/gent/vo/002/gvo00211/SHARED/jobs/2025.02/pr_917/15445297
| date | job status | comment |
|---|---|---|
| Feb 13 13:25:54 UTC 2025 | submitted | job id 15445297 awaits release by job manager |
| Feb 13 13:26:58 UTC 2025 | released | job awaits launch by Slurm scheduler |
| Feb 13 13:29:02 UTC 2025 | running | job 15445297 is running |
| Feb 13 15:05:06 UTC 2025 | finished | :cry: FAILURE (click triangle for details)
|
| Feb 13 15:05:06 UTC 2025 | test result | :grin: SUCCESS (click triangle for details)
|
@trz42 @ocaisa this looks like it is not doing what we expect it to do because it seems to be installing pytest-xdist in the accelerator path.
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/Python/3.11.3-GCCcore-12.3.0/bin/python -m pip install --prefix=/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/pytest-xdist/3.3.1-GCCcore-12.3.0 --verbose --no-deps --ignore-installed --no-index --no-build-isolation .
bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80fromlaraPPr- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80resulted in:- submitted job
45927, for details & status see https://github.com/EESSI/software-layer/pull/917#issuecomment-2656643474
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80fromlaraPPr- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80resulted in:- no jobs were submitted
Updates by the bot instance eessi-bot-casparvl
(click for details)
- account
laraPPrhas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-vsc-ugent
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80fromlaraPPr- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80resulted in:- no jobs were submitted
Updates by the bot instance trz42-GH200-jr
(click for details)
- account
laraPPrhas NO permission to send commands to the bot
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.02/pr_917/45927
| date | job status | comment |
|---|---|---|
| Feb 13 13:43:54 UTC 2025 | submitted | job id 45927 awaits release by job manager |
| Feb 13 13:44:41 UTC 2025 | released | job awaits launch by Slurm scheduler |
| Feb 13 13:53:28 UTC 2025 | running | job 45927 is running |
| Feb 13 14:18:15 UTC 2025 | finished | :cry: FAILURE (click triangle for details)
|
| Feb 13 14:18:15 UTC 2025 | test result | :cry: FAILURE (click triangle for details)
|
It failed building jax with this error:
FAILED: Installation ended unsuccessfully (build directory: /tmp/vsc48506/easybuild/build/jax/0.4.25/gfbf-2023a-CUDA-12.1.1): build failed (first 300 chars): Failed to determine installation prefix for binutils (took 39 mins 48 secs
and as you can see in the artifacts the non enabled cuda builds were build in the accelerator path.
Closing so I can test a new action for filtering
reopen to test replacement for dorny
Filtering works and check fail as expected