software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

{2023.06}[foss/2023a] waLBerla 6.1 w/ CUDA 12.1.1

Open Neves-P opened this issue 1 year ago • 83 comments

This PR adds the same version of waLBerla as installed previously, but with the updated foss2023a toolchain with CUDA support.

Neves-P avatar Oct 09 '24 11:10 Neves-P

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

eessi-bot[bot] avatar Oct 09 '24 11:10 eessi-bot[bot]

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

eessi-bot[bot] avatar Oct 09 '24 11:10 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

Neves-P avatar Oct 09 '24 11:10 Neves-P

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from Neves-P

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • submitted job 22293, for details & status see https://github.com/EESSI/software-layer/pull/780#issuecomment-2402071774

eessi-bot[bot] avatar Oct 09 '24 11:10 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from Neves-P

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Oct 09 '24 11:10 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_780/22293

date job status comment
Oct 09 11:35:25 UTC 2024 submitted job id 22293 awaits release by job manager
Oct 09 11:36:06 UTC 2024 released job awaits launch by Slurm scheduler
Oct 09 11:44:08 UTC 2024 running job 22293 is running
Oct 09 12:22:59 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-22293.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1728475687.tar.gzsize: 37 MiB (39133902 bytes)
entries: 8478
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
waLBerla/6.1-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
waLBerla/6.1-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/software/linux/x86_64/amd/zen2/.lmod/lmodrc.lua
2023.06/software/linux/x86_64/amd/zen2/.lmod/SitePackage.lua
Oct 09 12:22:59 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
:white_check_mark: job output file slurm-22293.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Oct 09 '24 11:10 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

Neves-P avatar Oct 09 '24 12:10 Neves-P

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from Neves-P

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • submitted job 22295, for details & status see https://github.com/EESSI/software-layer/pull/780#issuecomment-2402197284

eessi-bot[bot] avatar Oct 09 '24 12:10 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from Neves-P

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Oct 09 '24 12:10 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.10/pr_780/22295

date job status comment
Oct 09 12:33:20 UTC 2024 submitted job id 22295 awaits release by job manager
Oct 09 12:34:14 UTC 2024 released job awaits launch by Slurm scheduler
Oct 09 12:35:16 UTC 2024 running job 22295 is running
Oct 09 13:05:06 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-22295.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1728478320.tar.gzsize: 37 MiB (39136783 bytes)
entries: 8481
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
waLBerla/6.1-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
waLBerla/6.1-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/bash
2023.06/init/eessi_archdetect.sh
2023.06/init/eessi_environment_variables
2023.06/software/linux/x86_64/amd/zen3/.lmod/lmodrc.lua
2023.06/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
Oct 09 13:05:06 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case
Details
:white_check_mark: job output file slurm-22295.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Oct 09 '24 12:10 eessi-bot[bot]

@Neves-P easyconfig PR is merged, do we need to check/verify anything here, or is it ready to deploy?

boegel avatar Oct 11 '24 18:10 boegel

@boegel , I think this is good to go. Looking into the tarball I see a directory 2023.06\software\linux\x86_64\amd\zen3\accel\nvidia\cc80\software\waLBerla\6.1-foss-2023a-CUDA-12.1.1\build\src\cuda\.

This contains a static library libcuda.a (along with other things, mostly .cmake files and similar). This directory is not present on the non-CUDA waLBerla already in software.eessi.io. I take this to mean that the installation with CUDA did work, but I'm not sure how to confirm this in practice...

Neves-P avatar Oct 14 '24 08:10 Neves-P

bot: help

laraPPr avatar Feb 11 '25 11:02 laraPPr

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in: How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

eessi-bot[bot] avatar Feb 11 '25 11:02 eessi-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account laraPPr has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Feb 11 '25 11:02 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in: How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

eessi-bot[bot] avatar Feb 11 '25 11:02 eessi-bot[bot]

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command help from laraPPr

    • expanded format: help
  • handling command help resulted in: How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

gpu-bot-ugent[bot] avatar Feb 11 '25 11:02 gpu-bot-ugent[bot]

Updates by the bot instance trz42-GH200-jr (click for details)
  • account laraPPr has NO permission to send commands to the bot

eessi-bot-trz42[bot] avatar Feb 11 '25 11:02 eessi-bot-trz42[bot]

Updates by the bot instance eessi-bot-casparvl (click for details)
  • account laraPPr has NO permission to send commands to the bot

bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80

laraPPr avatar Feb 11 '25 11:02 laraPPr

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80 from laraPPr

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Feb 11 '25 11:02 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80 from laraPPr

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Feb 11 '25 11:02 eessi-bot[bot]

Updates by the bot instance eessi-bot-riscv (click for details)
  • account laraPPr has NO permission to send commands to the bot

riscv-eessi-io-bot[bot] avatar Feb 11 '25 11:02 riscv-eessi-io-bot[bot]

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80 from laraPPr
    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80

gpu-bot-ugent[bot] avatar Feb 11 '25 11:02 gpu-bot-ugent[bot]

Updates by the bot instance eessi-bot-casparvl (click for details)
  • account laraPPr has NO permission to send commands to the bot

Updates by the bot instance trz42-GH200-jr (click for details)
  • account laraPPr has NO permission to send commands to the bot

eessi-bot-trz42[bot] avatar Feb 11 '25 11:02 eessi-bot-trz42[bot]

Unable to download or merge changes between the source branch and the destination branch. Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts.

gpu-bot-ugent[bot] avatar Feb 11 '25 11:02 gpu-bot-ugent[bot]

@Neves-P Can you update the branch and than retrigger the Gent bot?

laraPPr avatar Feb 11 '25 11:02 laraPPr

bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80

Neves-P avatar Feb 11 '25 11:02 Neves-P

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software accel:nvidia/cc80 from Neves-P

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Feb 11 '25 11:02 eessi-bot[bot]