software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

{2023.06,zen4} foss/2022b

Open boegel opened this issue 1 year ago • 26 comments

boegel avatar May 07 '24 13:05 boegel

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar May 07 '24 13:05 eessi-bot[bot]

Instance eessi-bot-mc-azure is configured to build:

  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar May 07 '24 13:05 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4

boegel avatar May 07 '24 13:05 boegel

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar May 07 '24 13:05 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • submitted job 73, for details & status see https://github.com/EESSI/software-layer/pull/567#issuecomment-2098419369

eessi-bot[bot] avatar May 07 '24 13:05 eessi-bot[bot]

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_567/73

date job status comment
May 07 13:31:58 UTC 2024 submitted job id 73 awaits release by job manager
May 07 13:32:10 UTC 2024 released job awaits launch by Slurm scheduler
May 07 13:49:49 UTC 2024 running job 73 is running
May 07 13:57:02 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-73.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 07 13:57:02 UTC 2024 test result
:cry: FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
:white_check_mark: job output file slurm-73.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar May 07 '24 13:05 eessi-bot[bot]

I expect the OpenBLAS tests to fail here, see also notes available in https://gitlab.com/eessi/support/-/issues/37

boegel avatar May 07 '24 13:05 boegel

GCC build failed with g++: fatal error: Killed signal terminated program cc1plus because not enough memory is available, bot configuration needs to be tweaked on build cluster in Azure

boegel avatar May 07 '24 17:05 boegel

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4

boegel avatar May 07 '24 20:05 boegel

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar May 07 '24 20:05 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • submitted job 84, for details & status see https://github.com/EESSI/software-layer/pull/567#issuecomment-2099263481

eessi-bot[bot] avatar May 07 '24 20:05 eessi-bot[bot]

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_567/84

date job status comment
May 07 20:34:50 UTC 2024 submitted job id 84 awaits release by job manager
May 07 20:35:50 UTC 2024 released job awaits launch by Slurm scheduler
May 07 20:36:55 UTC 2024 running job 84 is running
May 07 22:53:48 UTC 2024 finished
:grin: SUCCESS (click triangle for details)
Details
:white_check_mark: job output file slurm-84.out
:white_check_mark: no message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-1715122155.tar.gzsize: 1348 MiB (1414095513 bytes)
entries: 24582
modules under 2023.06/software/linux/x86_64/amd/zen4/modules/all
BLIS/0.9.0-GCC-12.3.0.lua
CMake/3.26.3-GCCcore-12.3.0.lua
FFTW.MPI/3.3.10-gompi-2023a.lua
FFTW/3.3.10-GCC-12.3.0.lua
FlexiBLAS/3.3.1-GCC-12.3.0.lua
GCC/12.3.0.lua
GCCcore/12.3.0.lua
OpenBLAS/0.3.23-GCC-12.3.0.lua
OpenMPI/4.1.5-GCC-12.3.0.lua
OpenSSL/1.1.lua
PMIx/4.2.4-GCCcore-12.3.0.lua
Perl/5.36.1-GCCcore-12.3.0.lua
Python/3.11.3-GCCcore-12.3.0.lua
SQLite/3.42.0-GCCcore-12.3.0.lua
ScaLAPACK/2.2.0-gompi-2023a-fb.lua
Tcl/8.6.13-GCCcore-12.3.0.lua
UCC/1.2.0-GCCcore-12.3.0.lua
UCX/1.14.1-GCCcore-12.3.0.lua
UnZip/6.0-GCCcore-12.3.0.lua
cURL/8.0.1-GCCcore-12.3.0.lua
foss/2023a.lua
gompi/2023a.lua
hwloc/2.9.1-GCCcore-12.3.0.lua
libarchive/3.6.2-GCCcore-12.3.0.lua
libevent/2.1.12-GCCcore-12.3.0.lua
libfabric/1.18.0-GCCcore-12.3.0.lua
libffi/3.4.4-GCCcore-12.3.0.lua
libpciaccess/0.17-GCCcore-12.3.0.lua
libxml2/2.11.4-GCCcore-12.3.0.lua
make/4.4.1-GCCcore-12.3.0.lua
numactl/2.0.16-GCCcore-12.3.0.lua
pkgconf/1.8.0.lua
pkgconf/1.9.5-GCCcore-12.3.0.lua
xorg-macros/1.20.0-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen4/software
BLIS/0.9.0-GCC-12.3.0
CMake/3.26.3-GCCcore-12.3.0
FFTW.MPI/3.3.10-gompi-2023a
FFTW/3.3.10-GCC-12.3.0
FlexiBLAS/3.3.1-GCC-12.3.0
GCC/12.3.0
GCCcore/12.3.0
OpenBLAS/0.3.23-GCC-12.3.0
OpenMPI/4.1.5-GCC-12.3.0
OpenSSL/1.1
PMIx/4.2.4-GCCcore-12.3.0
Perl/5.36.1-GCCcore-12.3.0
Python/3.11.3-GCCcore-12.3.0
SQLite/3.42.0-GCCcore-12.3.0
ScaLAPACK/2.2.0-gompi-2023a-fb
Tcl/8.6.13-GCCcore-12.3.0
UCC/1.2.0-GCCcore-12.3.0
UCX/1.14.1-GCCcore-12.3.0
UnZip/6.0-GCCcore-12.3.0
cURL/8.0.1-GCCcore-12.3.0
foss/2023a
gompi/2023a
hwloc/2.9.1-GCCcore-12.3.0
libarchive/3.6.2-GCCcore-12.3.0
libevent/2.1.12-GCCcore-12.3.0
libfabric/1.18.0-GCCcore-12.3.0
libffi/3.4.4-GCCcore-12.3.0
libpciaccess/0.17-GCCcore-12.3.0
libxml2/2.11.4-GCCcore-12.3.0
make/4.4.1-GCCcore-12.3.0
numactl/2.0.16-GCCcore-12.3.0
pkgconf/1.8.0
pkgconf/1.9.5-GCCcore-12.3.0
xorg-macros/1.20.0-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen4
no other files in tarball
May 07 22:53:48 UTC 2024 test result
:cry: FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
:white_check_mark: job output file slurm-84.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case
May 08 05:31:48 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen4-1715122155.tar.gz to S3 bucket succeeded

eessi-bot[bot] avatar May 07 '24 20:05 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4

boegel avatar May 12 '24 09:05 boegel

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar May 12 '24 10:05 eessi-bot[bot]

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4

boegel avatar May 12 '24 10:05 boegel

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar May 12 '24 10:05 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • submitted job 87, for details & status see https://github.com/EESSI/software-layer/pull/567#issuecomment-2106193954

eessi-bot[bot] avatar May 12 '24 10:05 eessi-bot[bot]

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_567/87

date job status comment
May 12 10:08:39 UTC 2024 submitted job id 87 awaits release by job manager
May 12 10:09:05 UTC 2024 released job awaits launch by Slurm scheduler
May 12 10:22:24 UTC 2024 running job 87 is running
May 12 10:30:39 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-87.out
:x: found message matching ERROR:
:white_check_mark: no message matching FAILED:
:white_check_mark: no message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
May 12 10:30:39 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-87.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar May 12 '24 10:05 eessi-bot[bot]

Problem fixed by https://github.com/EESSI/software-layer/pull/573, so time to try again...

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4

boegel avatar May 16 '24 10:05 boegel

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar May 16 '24 10:05 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen4 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen4 resulted in:

    • submitted job 99, for details & status see https://github.com/EESSI/software-layer/pull/567#issuecomment-2114884314

eessi-bot[bot] avatar May 16 '24 10:05 eessi-bot[bot]

New job on instance eessi-bot-mc-azure for architecture x86_64-amd-zen4 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.05/pr_567/99

date job status comment
May 16 10:51:07 UTC 2024 submitted job id 99 awaits release by job manager
May 16 10:51:18 UTC 2024 released job awaits launch by Slurm scheduler
May 16 10:55:27 UTC 2024 running job 99 is running
May 16 13:01:15 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-99.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-1715863989.tar.gzsize: 1327 MiB (1392195174 bytes)
entries: 37336
modules under 2023.06/software/linux/x86_64/amd/zen4/modules/all
BLIS/0.9.0-GCC-12.2.0.lua
CMake/3.24.3-GCCcore-12.2.0.lua
DB/18.1.40-GCCcore-12.2.0.lua
FFTW/3.3.10-GCC-12.2.0.lua
GCC/12.2.0.lua
GCCcore/12.2.0.lua
OpenSSL/1.1.lua
Perl/5.36.0-GCCcore-12.2.0.lua
Python/3.10.8-GCCcore-12.2.0-bare.lua
SQLite/3.39.4-GCCcore-12.2.0.lua
Tcl/8.6.12-GCCcore-12.2.0.lua
UCX/1.13.1-GCCcore-12.2.0.lua
UnZip/6.0-GCCcore-12.2.0.lua
cURL/7.86.0-GCCcore-12.2.0.lua
expat/2.4.9-GCCcore-12.2.0.lua
groff/1.22.4-GCCcore-12.2.0.lua
libarchive/3.6.1-GCCcore-12.2.0.lua
libevent/2.1.12-GCCcore-12.2.0.lua
libfabric/1.16.1-GCCcore-12.2.0.lua
libffi/3.4.4-GCCcore-12.2.0.lua
libxml2/2.10.3-GCCcore-12.2.0.lua
make/4.3-GCCcore-12.2.0.lua
numactl/2.0.16-GCCcore-12.2.0.lua
pkgconf/1.8.0.lua
pkgconf/1.9.3-GCCcore-12.2.0.lua
software under 2023.06/software/linux/x86_64/amd/zen4/software
BLIS/0.9.0-GCC-12.2.0
CMake/3.24.3-GCCcore-12.2.0
DB/18.1.40-GCCcore-12.2.0
FFTW/3.3.10-GCC-12.2.0
GCC/12.2.0
GCCcore/12.2.0
OpenSSL/1.1
Perl/5.36.0-GCCcore-12.2.0
Python/3.10.8-GCCcore-12.2.0-bare
SQLite/3.39.4-GCCcore-12.2.0
Tcl/8.6.12-GCCcore-12.2.0
UCX/1.13.1-GCCcore-12.2.0
UnZip/6.0-GCCcore-12.2.0
cURL/7.86.0-GCCcore-12.2.0
expat/2.4.9-GCCcore-12.2.0
groff/1.22.4-GCCcore-12.2.0
libarchive/3.6.1-GCCcore-12.2.0
libevent/2.1.12-GCCcore-12.2.0
libfabric/1.16.1-GCCcore-12.2.0
libffi/3.4.4-GCCcore-12.2.0
libxml2/2.10.3-GCCcore-12.2.0
make/4.3-GCCcore-12.2.0
numactl/2.0.16-GCCcore-12.2.0
pkgconf/1.8.0
pkgconf/1.9.3-GCCcore-12.2.0
other under 2023.06/software/linux/x86_64/amd/zen4
2023.06/init/eessi_environment_variables
May 16 13:01:15 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-99.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar May 16 '24 10:05 eessi-bot[bot]

@boegel Should we close this? I thought we'd decided not to support 2022b on zen4

ocaisa avatar Aug 07 '24 11:08 ocaisa

I guess we could (unless we want to figure out how to fix the broken tests for older OpenBLAS versions, which is the main issue here), but shouldn't we then also come up with a way to generate fake module files that clearly mention that the 2022b toolchain is not supported for zen4?

boegel avatar Aug 08 '24 09:08 boegel

Is OpenBLAS the only problem? We can sidestep the issue by forcing a Zen3 build (with a modloadmsg warning of potentially poor performance?), or accepting the failing tests?

ocaisa avatar Aug 08 '24 10:08 ocaisa

Another options is to symlink in the entire set of Zen3 modules for 2022b and add a hook for GCCcore that warns of unoptimised performance for this toolchain

ocaisa avatar Aug 08 '24 10:08 ocaisa