software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

[WIP] DEBUG only {2023.06,2023a} PyTorch-bundle v2.1.2

Open trz42 opened this issue 1 year ago • 43 comments

The main purpose of this PR is to facilitate debugging various issues when building PyTorch-bundle and demonstrating approaches that could solve the issues. It is expected that the fixes provided here are not final.

  • ~includes a fix for find_library provided by ctypes.util which prevented importing soundfile~
    • superseeded by fixing it in the Python installations
  • includes a fix for aarch64/{generic,neoverse_n1,neoverse_v1} where importing sentencepiece lead to the error libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
  • ~includes a fix for the extension torchvision where some library was not compiled with jpeg support, hence some tests failed $\rightarrow$~
    • was fixed by https://github.com/easybuilders/easybuild-easyblocks/pull/3322
    • we move to use EasyBuild/4.9.2 for building this PR because the updated easyblock for torchvision (PR 3322) has been released with that EasyBuild/4.9.2

Initially we will disable all fixes, build for selected architectures and document the errors. We then enable fixes one-by-one and document the results (some error fixed, some new errors, ...).

Note, see the original PR for PyTorch-bundle (https://github.com/EESSI/software-layer/pull/585) for additional discussion about some of the issues listed above.

trz42 avatar Jun 12 '24 09:06 trz42

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar Jun 12 '24 09:06 eessi-bot[bot]

Instance eessi-bot-mc-azure is configured to build:

  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-software

eessi-bot[bot] avatar Jun 12 '24 09:06 eessi-bot[bot]

Initially we'll build only for zen2 and aarch64/generic...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

trz42 avatar Jun 12 '24 11:06 trz42

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12607, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2162771800
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • submitted job 12608, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2162771906

eessi-bot[bot] avatar Jun 12 '24 11:06 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jun 12 '24 11:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12607

  • fails in the sanity check for librosa/0.10.1-foss-2023a when running python -c "import soundfile" with the log messages
== 2024-06-12 12:00:43,829 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
    import _soundfile_data  # ImportError if this doesn't exist
    ^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in <module>
    _snd = _ffi.dlopen(_libname)
           ^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so.1': libsndfile.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 192, in <module>
    _snd = _ffi.dlopen(_explicit_libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory,  (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
  • to work around this error we need a custom ctypes
date job status comment
Jun 12 11:27:18 UTC 2024 submitted job id 12607 awaits release by job manager
Jun 12 11:28:21 UTC 2024 released job awaits launch by Slurm scheduler
Jun 12 11:35:26 UTC 2024 running job 12607 is running
Jun 12 12:08:26 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12607.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1718193717.tar.gzsize: 162 MiB (170635688 bytes)
entries: 6322
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
imageio/2.33.1-gfbf-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
imageio/2.33.1-gfbf-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 12 12:08:26 UTC 2024 test result
:cry: FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12607.out
:x: found message matching ERROR:
:x: found message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 12 '24 11:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12608

  • fails in the sanity check for librosa/0.10.1-foss-2023a when running python -c "import soundfile" with the log messages
== 2024-06-12 11:55:32,669 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
    import _soundfile_data  # ImportError if this doesn't exist
    ^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in <module>
    _snd = _ffi.dlopen(_libname)
           ^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so.1': libsndfile.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 192, in <module>
    _snd = _ffi.dlopen(_explicit_libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory,  (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
  • to work around this error we need a custom ctypes
date job status comment
Jun 12 11:27:22 UTC 2024 submitted job id 12608 awaits release by job manager
Jun 12 11:28:19 UTC 2024 released job awaits launch by Slurm scheduler
Jun 12 11:34:23 UTC 2024 running job 12608 is running
Jun 12 12:04:20 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12608.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1718193401.tar.gzsize: 152 MiB (160274969 bytes)
entries: 6322
modules under 2023.06/software/linux/aarch64/generic/modules/all
imageio/2.33.1-gfbf-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
imageio/2.33.1-gfbf-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Jun 12 12:04:20 UTC 2024 test result
:cry: FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12608.out
:x: found message matching ERROR:
:x: found message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 12 '24 11:06 eessi-bot[bot]

The two jobs (12607 and 12608) that did not include any fixes failed both in the sanity check for librosa. After enabling the fixes for that by

  • installing a custom ctypes library;
  • adding a parse_hook to use the custom ctypes library in the sanity check; and
  • adding a pre_module_hook that adds a setting to use this custom ctypes library when the module for librosa is loaded;

we repeat the building for the same architectures zen2 and aarch64/generic...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

trz42 avatar Jun 15 '24 12:06 trz42

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12808, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2169398033
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • submitted job 12809, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2169398074

eessi-bot[bot] avatar Jun 15 '24 12:06 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jun 15 '24 12:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12808

  • failed with errors when testing the extension torchvision of PyTorch-bundle...
=================================== FAILURES ===================================
___ test_decode_jpeg[None-ImageReadMode.UNCHANGED-grace_hopper_517x606.jpg] ____
test/test_image.py:94: in test_decode_jpeg
    img_ljpeg = decode_image(data, mode=mode)
/tmp/eb-7t6okia0/eb-js7oqjgv/tmpjpww4km2/lib/python3.11/site-packages/torchvision/io/image.py:236: in decode_image
    output = torch.ops.image.decode_image(input, mode.value)
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/PyTorch/2.1.2-foss-2023a/lib/python3.11/site-packages/torch/_ops.py:692: in __call__
    return self._op(*args, **kwargs or {})
E   RuntimeError: decode_jpeg: torchvision not compiled with libjpeg support
  • inspecting the job's individual build step logs (via bot/inspect.sh --resume previous_tmp/build_step/eessi.io-2023.06-software-1718457554.tgz run in the job's working directory /project/def-users/SHARED/jobs/2024.06/pr_603/12808 on the same type of node // e.g., via an interactive job submitted with srun --partition x86-64-amd-zen2-node --time=60 --pty bash), we find the following messages in /tmp/eb-7t6okia0/eb-js7oqjgv/easybuild-run_cmd-9b5lqisq.log (log file for building the extension torchvision)
  Compiling extensions with following flags:
    FORCE_CUDA: False
    FORCE_MPS: False
    DEBUG: False
    TORCHVISION_USE_PNG: True
    TORCHVISION_USE_JPEG: True
    TORCHVISION_USE_NVJPEG: True
    TORCHVISION_USE_FFMPEG: True
    TORCHVISION_USE_VIDEO_CODEC: True
    NVCC_FLAGS:
  Compiling with debug mode OFF
  Found PNG library
  Building torchvision with PNG image support
    libpng version: 1.6.39
    libpng include path: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/libpng/1.6.39-GCCcore-12.3.0/include/libpng16
  Running build on conda-build: False
  Running build on conda: False
  Building torchvision without JPEG image support
  Building torchvision without NVJPEG image support
  • it looks like it doesn't find the jpeg library and hence builds without JPEG support
  • consequently, it later fails in the test step
  • the setup.py in /tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchvision/vision-0.16.2 that produces the above messages showing that torchvision is compiled without JPEG support includes a function find_library with the following code
    def find_library(name, vision_include):
        this_dir = os.path.dirname(os.path.abspath(__file__))
        build_prefix = os.environ.get("BUILD_PREFIX", None)
        is_conda_build = build_prefix is not None
    
        library_found = False
        conda_installed = False
        lib_folder = None
        include_folder = None
        library_header = f"{name}.h"
    
        # Lookup in TORCHVISION_INCLUDE or in the package file
        package_path = [os.path.join(this_dir, "torchvision")]
        for folder in vision_include + package_path:
            candidate_path = os.path.join(folder, library_header)
            library_found = os.path.exists(candidate_path)
            if library_found:
                break
    
  • running the build script (setup.py) manually in an "inspect" session revealed that the second parameter to find_library was an empty list []
    • the suspicion is that TORCHVISION_INCLUDE was not set although it should have been if the easyblock for torchvision is used, see https://github.com/easybuilders/easybuild-easyblocks/blob/10e9a62d44d653e04f735962620a33bc22225477/easybuild/easyblocks/t/torchvision.py#L83-L85
date job status comment
Jun 15 12:04:28 UTC 2024 submitted job id 12808 awaits release by job manager
Jun 15 12:04:32 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 12:10:36 UTC 2024 running job 12808 is running
Jun 15 13:47:58 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12808.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1718457726.tar.gzsize: 282 MiB (296485955 bytes)
entries: 9314
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 15 13:47:58 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12808.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 12:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12809

  • failed in the sanity check for SentencePiece/0.2.0-GCC-12.3.0 with the following log messages
== 2024-06-15 12:40:44,834 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import sentencepiece' exited with code 1 (output: Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <module>
    from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
) (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
date job status comment
Jun 15 12:04:32 UTC 2024 submitted job id 12809 awaits release by job manager
Jun 15 12:05:34 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 12:11:38 UTC 2024 running job 12809 is running
Jun 15 13:04:14 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12809.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1718455310.tar.gzsize: 258 MiB (270844195 bytes)
entries: 9169
modules under 2023.06/software/linux/aarch64/generic/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Jun 15 13:04:14 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12809.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 12:06 eessi-bot[bot]

The two jobs (12608 // zen2 and 12609 // aarch64/generic) didn't fail for the earlier reason (import of soundfile failed). They failed for different reasons however (for details see above). We first fix the issue for aarch64/generic (because the build for that architecture failed earlier than the build for zen2). The fix disables the use of the TC_MALLOC library. Because the fix is made for aarch64/generic only, we also check if builds for the other aarch64 are affected by the issue.

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software

trz42 avatar Jun 15 '24 18:06 trz42

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • submitted job 12813, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2170460546
  • handling command build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12814, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2170460599
  • handling command build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12815, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2170460654

eessi-bot[bot] avatar Jun 15 '24 18:06 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jun 15 '24 18:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12813

  • fails with a new error for extension torchtext
== 2024-06-15 18:44:56,282 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-4o0di9ui/eb-qo9jlvzo/tmp0g004
oib/lib/python3.11/site-packages:$PYTHONPATH &&  pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault

Current thread 0x000040002a9e5a00 (most recent call first):
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/assertion/rewrite.py", line
 178 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/importlib/__init__.py", line 126 in import_module
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/pathlib.py", line 565 in im
port_path
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 617 in _im
porttestmodule
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 528 in _ge
tobj
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 310 in obj
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 545 in _in
ject_setup_module_fixture
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 531 in col
lect
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/runner.py", line 372 in <la
mbda>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/runner.py", line 341 in fro
m_call
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/runner.py", line 372 in pyt
est_make_collect_report
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/runner.py", line 547 in col
lect_one_node
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 836 in genit
ems
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 839 in genit
ems
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 669 in perfo
rm_collect
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 334 in pytes
t_collection
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 323 in _main
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_
session
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytes
t_cmdline_main
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/hatchling/1.18.0-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/config/__init__.py", line 1
66 in main
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/config/__init__.py", line 1
89 in console_main
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.ra
ndom._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._lina
lg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, gmpy2.gmpy2, simplejson._speedups (total: 22)
  • it may be that we have seen that earlier when building for NESSI ... we didn't have a fix for that there, so this requires more investigation
date job status comment
Jun 15 18:07:39 UTC 2024 submitted job id 12813 awaits release by job manager
Jun 15 18:08:23 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 18:13:30 UTC 2024 running job 12813 is running
Jun 15 19:09:48 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12813.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1718477177.tar.gzsize: 271 MiB (284370882 bytes)
entries: 9314
modules under 2023.06/software/linux/aarch64/generic/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Jun 15 19:09:48 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12813.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 18:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_n1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12814

  • failed with the same error as aarch64/generic
== 2024-06-15 18:42:59,199 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import
sentencepiece' exited with code 1 (output: Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <m
odule>
    from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static T
LS block
date job status comment
Jun 15 18:07:43 UTC 2024 submitted job id 12814 awaits release by job manager
Jun 15 18:08:25 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 18:14:32 UTC 2024 running job 12814 is running
Jun 15 19:06:45 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12814.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-1718477049.tar.gzsize: 258 MiB (271203990 bytes)
entries: 9169
modules under 2023.06/software/linux/aarch64/neoverse_n1/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/neoverse_n1
2023.06/init/easybuild/eb_hooks.py
Jun 15 19:06:45 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12814.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 18:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12815

  • failed with the same error as on aarch64/generic
== 2024-06-15 18:36:00,141 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import
sentencepiece' exited with code 1 (output: Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <m
odule>
    from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static T
LS block
date job status comment
Jun 15 18:07:47 UTC 2024 submitted job id 12815 awaits release by job manager
Jun 15 18:08:27 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 18:14:34 UTC 2024 running job 12815 is running
Jun 15 18:52:16 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12815.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1718476614.tar.gzsize: 258 MiB (270948899 bytes)
entries: 9169
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/neoverse_v1
2023.06/init/easybuild/eb_hooks.py
Jun 15 18:52:16 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12815.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 18:06 eessi-bot[bot]

Rebuilding for aarch64/neoverse_n1 and aarch64/neoverse_v1 after fix for SentencePiece has been extended to these architectures...

bot: build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software bot: build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software

trz42 avatar Jun 15 '24 19:06 trz42

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12816, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2170583100
  • handling command build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software resulted in:

    • submitted job 12817, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2170583352

eessi-bot[bot] avatar Jun 15 '24 19:06 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/neoverse_n1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/neoverse_v1 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/neoverse_n1 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/neoverse_v1 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jun 15 '24 19:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_n1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12816

  • now fails with the same error as the build for aarch64/generic
== 2024-06-15 20:08:01,404 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-t17gza4h/eb-ul5a_hbb/tmpr1l71
y06/lib/python3.11/site-packages:$PYTHONPATH &&  pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault

Current thread 0x000040003d3e5a80 (most recent call first):
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_n1/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/assertion/rewrite.py",
line 178 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
...
date job status comment
Jun 15 19:34:52 UTC 2024 submitted job id 12816 awaits release by job manager
Jun 15 19:35:52 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 19:36:56 UTC 2024 running job 12816 is running
Jun 15 20:35:33 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12816.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_n1-1718482255.tar.gzsize: 271 MiB (284726536 bytes)
entries: 9314
modules under 2023.06/software/linux/aarch64/neoverse_n1/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/neoverse_n1/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/neoverse_n1
2023.06/init/easybuild/eb_hooks.py
Jun 15 20:35:33 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12816.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 19:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture aarch64-neoverse_v1 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12817

  • now fails with the same error as the build for aarch64/generic
== 2024-06-15 20:00:37,536 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-663ngo7q/eb-6zm49he7/tmph9cft
g0x/lib/python3.11/site-packages:$PYTHONPATH &&  pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and no
t test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault

Current thread 0x000040003cc75a80 (most recent call first):
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/assertion/rewrite.py",
line 178 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
date job status comment
Jun 15 19:34:56 UTC 2024 submitted job id 12817 awaits release by job manager
Jun 15 19:35:54 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 19:36:58 UTC 2024 running job 12817 is running
Jun 15 20:18:15 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-12817.out
:x: found message matching ERROR:
:x: found message matching FAILED:
:x: found message matching required modules missing:
:x: no message matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-neoverse_v1-1718481760.tar.gzsize: 271 MiB (284470404 bytes)
entries: 9314
modules under 2023.06/software/linux/aarch64/neoverse_v1/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/neoverse_v1/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/neoverse_v1
2023.06/init/easybuild/eb_hooks.py
Jun 15 20:18:15 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-12817.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 15 '24 19:06 eessi-bot[bot]

Rebuilding for zen2 to verify if a new easyblock for torchvision fixes the issue that libjpeg couldn't be find...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software

trz42 avatar Jun 29 '24 20:06 trz42

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • submitted job 13549, for details & status see https://github.com/EESSI/software-layer/pull/603#issuecomment-2198332951

eessi-bot[bot] avatar Jun 29 '24 20:06 eessi-bot[bot]

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

eessi-bot[bot] avatar Jun 29 '24 20:06 eessi-bot[bot]

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/13549

  • the installation of PyTorch-bundle succeeded, so the updated easyblock for torchvision works! :tada:
  • however, the build failed when checking for missing installations with
1 out of 138 required modules missing:

* grpcio/1.57.0-GCCcore-12.3.0 (grpcio-1.57.0-GCCcore-12.3.0.eb)
  • that should be easy to fix, see https://github.com/NorESSI/software-layer/pull/408
date job status comment
Jun 29 20:55:20 UTC 2024 submitted job id 13549 awaits release by job manager
Jun 29 20:55:26 UTC 2024 released job awaits launch by Slurm scheduler
Jun 29 21:00:28 UTC 2024 running job 13549 is running
Jun 29 23:04:35 UTC 2024 finished
:cry: FAILURE (click triangle for details)
Details
:white_check_mark: job output file slurm-13549.out
:x: found message matching ERROR:
:white_check_mark: no message matching FAILED:
:x: found message matching required modules missing:
:white_check_mark: found message(s) matching No missing installations
:white_check_mark: found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1719701425.tar.gzsize: 293 MiB (307397497 bytes)
entries: 10800
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
PyTorch-bundle/2.1.2-foss-2023a.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
PyTorch-bundle/2.1.2-foss-2023a
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 29 23:04:35 UTC 2024 test result
:grin: SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 14/14 test case(s) from 14 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
:white_check_mark: job output file slurm-13549.out
:x: found message matching ERROR:
:white_check_mark: no message matching [\s*FAILED\s*].*Ran .* test case

eessi-bot[bot] avatar Jun 29 '24 20:06 eessi-bot[bot]

Rebuilding for zen2 to verify if a new easyblock for torchvision fixes the issue that libjpeg couldn't be find...

Maybe related to:

  • https://github.com/easybuilders/easybuild-easyblocks/pull/3322

boegel avatar Jul 05 '24 06:07 boegel

Rebuilding after #655 got merged to verify if the import soundfile in librosa's sanity check succeeds...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

trz42 avatar Aug 01 '24 07:08 trz42