software-layer
software-layer copied to clipboard
Add accelerator detection to Lmod version of EESSI initialisation
Requires
- [ ] https://github.com/EESSI/software-layer/pull/783
Instance eessi-bot-mc-aws is configured to build for:
- architectures:
x86_64/generic,x86_64/intel/haswell,x86_64/intel/skylake_avx512,x86_64/amd/zen2,x86_64/amd/zen3,aarch64/generic,aarch64/neoverse_n1,aarch64/neoverse_v1 - repositories:
eessi.io-2023.06-compat,eessi-hpc.org-2023.06-software,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software
Instance eessi-bot-mc-azure is configured to build for:
- architectures:
x86_64/amd/zen4 - repositories:
eessi-hpc.org-2023.06-software,eessi-hpc.org-2023.06-compat,eessi.io-2023.06-software,eessi.io-2023.06-compat
@boegel You are probably best placed to review this. I've added a lot of new CI which caught a few problems (in particular unloading the module did not unset a range of environment variables). There is also a check which compares the environment available via the module to that given by the init script.
I've also added a debug mode which can be used to print additional information.
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
23406, for details & status see https://github.com/EESSI/software-layer/pull/781#issuecomment-2414965066
- submitted job
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
ocaisahas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_781/23406
| date | job status | comment |
|---|---|---|
| Oct 15 20:32:17 UTC 2024 | submitted | job id 23406 awaits release by job manager |
| Oct 15 20:32:41 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Oct 15 20:37:56 UTC 2024 | running | job 23406 is running |
| Oct 15 20:48:29 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| Oct 15 20:48:29 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
23584, for details & status see https://github.com/EESSI/software-layer/pull/781#issuecomment-2416055470
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- no jobs were submitted
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
ocaisahas NO permission to send commands to the bot
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_781/23584
| date | job status | comment |
|---|---|---|
| Oct 16 08:15:16 UTC 2024 | submitted | job id 23584 awaits release by job manager |
| Oct 16 08:15:44 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Oct 16 08:22:47 UTC 2024 | running | job 23584 is running |
| Oct 16 08:28:53 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| Oct 16 08:28:53 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
ocaisahas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
23587, for details & status see https://github.com/EESSI/software-layer/pull/781#issuecomment-2416129990
- submitted job
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_781/23587
| date | job status | comment |
|---|---|---|
| Oct 16 08:47:23 UTC 2024 | submitted | job id 23587 awaits release by job manager |
| Oct 16 08:48:04 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Oct 16 08:49:14 UTC 2024 | running | job 23587 is running |
| Oct 16 08:55:54 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| Oct 16 08:55:54 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
There's a lot of testing in CI right now, local testing is pretty straightforward:
# Checkout the PR branch, have Lmod available, and cd in
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/amd/zen3
export EESSI_ACCELERATOR_TARGET_OVERRIDE=accel/nvidia/cc80
export EESSI_DEBUG_INIT=true
module unuse $MODULEPATH
module use $PWD/init/modules
module load EESSI
@ocaisa I'm happy to review, but I guess I should wait until CI is green (again)?
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
23592, for details & status see https://github.com/EESSI/software-layer/pull/781#issuecomment-2416552572
- submitted job
Updates by the bot instance boegel-bot-deucalion
(click for details)
- account
ocaisahas NO permission to send commands to the bot
Updates by the bot instance eessi-bot-mc-azure
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- no jobs were submitted
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_781/23592
| date | job status | comment |
|---|---|---|
| Oct 16 11:36:25 UTC 2024 | submitted | job id 23592 awaits release by job manager |
| Oct 16 11:37:20 UTC 2024 | released | job awaits launch by Slurm scheduler |
| Oct 16 11:43:22 UTC 2024 | running | job 23592 is running |
| Oct 16 11:53:34 UTC 2024 | finished | :grin: SUCCESS (click triangle for details)
|
| Oct 16 11:53:34 UTC 2024 | test result | :grin: SUCCESS (click triangle for details)
|
@casparvl @boegel Relevant tests are now passing again here (Lmod does things in the environment that make it hard to do a vanillla comparison).
This should be good to go, retriggered the build for deploy.
Switched to pushenv for LMOD_PACKAGE_PATH in https://github.com/EESSI/software-layer/pull/781/commits/04c25738137a041a0bda39ba80f7e291c2e72833
I'm reluctant to put LMOD_PACKAGE_PATH into CI as it would need to be a sensible value if you are going to have it set when calling Lmod...and the only sensible one we have in CI is the one we ship with EESSI
bot: build repo:eessi.io-2023.06-software arch:x86_64/generic
Updates by the bot instance eessi-bot-mc-aws
(click for details)
-
received bot command
build repo:eessi.io-2023.06-software arch:x86_64/genericfromocaisa- expanded format:
build repository:eessi.io-2023.06-software architecture:x86_64/generic
- expanded format:
-
handling command
build repository:eessi.io-2023.06-software architecture:x86_64/genericresulted in:- submitted job
23603, for details & status see https://github.com/EESSI/software-layer/pull/781#issuecomment-2417039837
- submitted job