easybuild-easyblocks
easybuild-easyblocks copied to clipboard
enhance `LLVM` easyblock for compilation of clang/flang + other llvm-projects
This EB is a modified version of the current Clang one (whether to keep them separate, having a simpler EB for Clang-only, or rename Clang to LLVMcore or viceversa is up for debate).
It enables building most of the main LLVM projects. What is still missing is/has not been tested:
libclibclcllvm-libgccpstl
The current EB improves on the Clang one by:
- Implementing new version of flags for version of
LLVM >= 18 - Implement running the test-suite for all LLVM projects (with an EC specified allowed number of failures)
- Implement both a bootstrapped and non-bootstrapped version of the toolchain
- Implement a LLVM only build (bootstrapped only) where all dependencies from GCC are removed at the final stage
Currently/WIP:
-
The
full_llvmbuild finishes successfully with 74 out of 119128 tests failing- 51 failures in
bolt - 1 failure in
clang tools - 18 failures in
mlir - 3 failures in
lldb - 1 failure in
ompd - Same set of failures without the MLIR ones and only 1
lldbwith a static build (55 out of 119090)
- 51 failures in
-
the
bootstrappednonfull_llvmbuild finishes successfully with 46 out of 119897 tests failing- 2 failures in
bolt - 1 failure in
clang tools - 18 failures in
mlir - 21 failures in
MemProfiler-x86_64-linux - 3 failures in
lldb - 1 failure in
ompd
- 2 failures in
-
the non
bootstrappedversion builds successfully but not build the test-suite (runningmake -j 1 check-allinside the builddir worked leading to 52 failures + 2 timeout out of 119900) (NOTE: the number of MemProfiler and timeout failure is not consistent)- Note: arguable benefit of the non-bootstrapped build as it took almost the same time as a 3 stage bootstrapped one
The last stage with all the required projects is always the longest and it seems that
clangis faster at building (at-least within the scope of the LLVM project) - Test if problem is with
make -j XXX (!=1)- UPDATE: Added a step to try compiling the tests with
-j 1if the parallel build fails and the installation runs successfully
- UPDATE: Added a step to try compiling the tests with
- Investigate further failures once the build runs successfully with EB only
- 2 failures in
bolt - 1 failure in
clang tools - 18 failures in
mlir - 25 failures in
MemProfiler-x86_64-linux(this number is not consistent across builds) - 5 failures in
lldb - 1 failure in
ompd
- Note: arguable benefit of the non-bootstrapped build as it took almost the same time as a 3 stage bootstrapped one
The last stage with all the required projects is always the longest and it seems that
-
[X] Improving sanity check depending on enabled options
Notes:
- Building a simple C++ openmp program using
iostreamseems to work both withlibc++andlibstdc++(tested with thefull_llvmbuild and thebootstrappednon full one) - For both builds an hello world openmp program compiles with both
libompandlibgompbut the one compiled withlibgompdoes not seem to recognizeOMP_NUM_THREADSand runs with 1 thread. Tested with C (clang), C++ (clang++) and fortran (flang-new)
Possible TODOs:
- [X] Test GPU offloading by building and running on a system with GPU
- For now works only with
clang(Tested on VEGA front-end with an A100) - See this discourse post for
flang-new
- For now works only with
- [X] Test RPATH wrappers
I haven't looked at this in detail, but I'm not a big fan of "forking" an existing easyblock, and then updating it. There seems to be a lot of overlap still with the custom easyblock for Clang.
I do understand that the other option (updating the existing custom easyblock for Clang to introduce updates/enhancements) is painful too though, partially since there's a lot of history there.
@Crivella How difficult would it be to integrate the functionality you need in the existing easyblock for Clang (or LLVM, I'm not even sure which one is most relevant here, but I guess it's the Clang one).
@boegel I will try to work on it as soon as I finish all the current tests (there is still something to iron out on non shared builds, and understanding if the test failures should be addressed or can be ignored). I went this route originally as I was working both on a standalone toolchain for building the single components separately and this one (and also for ease of testing features out). I will have to check how compatible the way the bootstrap was being done in the previous build with respect to this one, but it should be doable.
I do not know if it would still make sense to keep the Clang naming scheme as this would end up including also Flang and other optimization tools usable also with other frontends.
@boegel Isn't it a lot to ask to merge these easyblocks? The reality is that we already have two other forks for LLVM and AOMP, so the precedent already exists for taking a new/alternative path.
@boegel
I've been looking back at the 2 EB side by side and while not impossible to merge them, i think the most feasible approach would be the QE route of calling a different EB based on the version of the software.
While a lot of pieces have been taken from the Clang EB, here I've taken a fundamentally different approach for a few options and for the bootstrap build which would require deprecating options from the old EB if a new version is used.
Concerning the bootstrap build, in this EB, not all tools are being built at all stages of the build (only the minimally required one) and different options are being passed at different stages which would require rewriting half of the Clang EB to merge.
There is also the fact that for the Clang EB, flang is just an optional part of the build to be manually specified through the llvm_projects extra option.
The idea behind this EB is not to build just Clang but all the tooling required to have an LLVM toolchain in place which would require both clang and flang to always be present
One thing I don't actually like is the name, clang.py, because it is not just clang, it is LLVM with front end compilers. That is part of the complication, because in reality it should be llvm.py, and clang.py should stop being used. QE-style forking within the relevant easyblocks seems like the only available option.
As an update, i've also tested now with all tools enabled and build_targets = ['all'].
The compilation + test_suite was successful with only a slight increase in errors (with respect to the previous bootstrap+full_llvm build with default build target):
- 77 errors out of 119238 tests
- 50 due to
bolt - 6 due to
clang+ PowerPC architecture - 1 due to
clang-tools - 18 due to
mlir - 1 due to
lldb - 1 due to
ompd
- 50 due to
It should be noted that before 34341 (28.84%) were flagged as unsupported while now only 4588 (3.85%) which should be due to the experimental archs
UPDATE
Testing with openmp offloading on VEGA frontend node:
- AMD EPYC 7452 (Zen2 Rome)
- NVIDIA A100
With:
build_targets = ['X86', 'NVPTX']
cuda_compute_capabilities = ['8.0']
I encountered extra errors related to libomptarget:
- 270 due to x86 (out of 275)
- 11 due to nvptx (out of 302)
As a sanitycheck I also recompiled on my WS by allowing libomptarget to be compiled for x86 (before i had it enabled only for nvptx and amd) and i get a similar number of test errors (268)
Beside that i tried to compile a simple openmp application and the GPU offloading seems to be working correctly (verified the GPU is being used from nvtop)
UPDATE
Testing with openmp offloading on VEGA frontend node:
- AMD EPYC 7452 (Zen2 Rome)
- NVIDIA A100
With:
build_targets = ['X86', 'NVPTX']
cuda_compute_capabilities = ['8.0']
I encountered extra errors related to libomptarget:
- 270 due to x86 (out of 275)
- 11 due to nvptx (out of 302)
As a sanitycheck I also recompiled on my WS by allowing libomptarget to be compiled for x86 (before i had it enabled only for nvptx and amd) and i get a similar number of test errors (268)
Beside that i tried to compile a simple openmp application and the GPU offloading seems to be working correctly (verified the GPU is being used from nvtop)
I've tried to build LLVM using your easyblock and easyconfig PRs on a fresh devel installation. I had some issues with Python, where the tests reported that Python itself was not found even though it was installed. After disabling the Python parts I had no trouble building both LLVMcore variants.
I haven't looked to deeply into which and how many tests failed though. I built everything on my local workstation (Intel Core i7-12700, AMD Radeon RX7700XT).
@Thyre
Thanks for the report, will look into it
The error was related to the python_bindings = True setting in the EC files?
@Thyre
Thanks for the report, will look into it The error was related to the
python_bindings = Truesetting in the EC files?
Yes, exactly. When setting it to False, everything worked fine.
The steps to reproduce were basically:
- Clone the EB repos (with your PRs) (instructions taken from here)
- Run the following command:
eb --robot LLVMcore-18.1.7.eb LLVMcore-18.1.7-GCCcore-13.3.0.eb --parallel 8
I had some issues with Python, where the tests reported that Python itself was not found even though it was installed.
I have an idea on what might be happening. Was the error during the test_step or the sanity_check_step.
In case it is the latter it is very likely that the reason would be that I had another python available and capable of working on my WS during the build which was causing the sanity check to pass even if python was not in the normal dependencies instead of the build ones.
Let me know if this is the case, in the meanwhile i will fix the EC files
UPDATE:
after some discussion in the 2024/07/17 Easybuild Conference Call, it was decided to replace the EB_LLVM easyblock with the new one from this PR.
To this end, the following has been added:
- a
minimalbuild option has been added to compile LLVM only. - A bypass of the version checks
LLVM>=18.1.6andGCCcore>=13.3.0when minimal is enabled to allow building of older versions (all the CMake variable are the same and a few extra ones will just be ignored) Tested manually up to version 14.0.3 - Modified PR on EC to add new version of LLVM and modify the old ones to use the new easyblock starting from version 11.1.0 (The older ones make use ofr archived/depcrecated toolchains) (https://docs.easybuild.io/policies/toolchains/)
The following tests have been manually performed for sanity check:
- Build of the following LLVM versions:
- [x] 14.0.3
- [X] 14.0.6-lite
- [X] 16.0.6
- Build of packages actively depending on LLVM:
- [X] LDC-1.30.0-GCCcore-11.3.0.eb <-- ('LLVM', '14.0.3')
- [X] LDC-1.36.0-GCCcore-12.3.0.eb <-- ('LLVM', '16.0.6')
- [X] umap-learn-0.5.3-foss-2022a.eb: <-- ('LLVM', '14.0.3'),
- [X] umap-learn-0.5.5-foss-2023a.eb <-- ('LLVM', '16.0.6')
- [X] numba-0.58.1-foss-2023a.eb <-- ('LLVM', '14.0.6', '-llvmlite')
- [X] Mesa-22.0.3-GCCcore-11.3.0.eb <-- ('LLVM', '14.0.3')
- [X] Mesa-23.1.4-GCCcore-12.3.0.eb <-- ('LLVM', '16.0.6')
Following a discussion in slack about the deprecation warning about GCC_INSTALL_PREFIX switched to an error for LLVM 19 (see also https://github.com/llvm/llvm-project/pull/85891):
-
COMMITS:
- https://github.com/easybuilders/easybuild-easyblocks/pull/3373/commits/bf310765b3affb958e71cbdaea58b933751f703f
- https://github.com/easybuilders/easybuild-easyblocks/pull/3373/commits/13a7a9751653bd0e8a5db3e825817a831ea40e9d
-
Added version check to use the following for LLVM >= 19 instead of
GCC_INSTALL_PREFIX- For the build:
-DRUNTIMES_CMAKE_ARGS="-DCMAKE_C_FLAGS=--gcc-install-dir=$GCC_ROOT;-DCMAKE_CXX_FLAGS=--gcc-install-dir=$GCC_ROOT"where$GCC_ROOTshould point to$EBROOT.../lib/gcc/<triple>/<gcc_version>(see https://github.com/llvm/llvm-project/pull/85891#issuecomment-2021370667) - Added generation of config files in the
post_install_stepwith the same name as the compilers next to them to automatically set--gcc-install-dirwhen the compiler is used (used to get the right location for thecrtfiles likecrtbeginS.o) (see https://discourse.llvm.org/t/add-gcc-install-dir-deprecate-gcc-toolchain-and-remove-gcc-install-prefix/65091). This behavior can be overwritten by manually setting the flag at compiler invocation or using another config files on top of the generated one- NOTE: The
flang-newcompiler looks for aflang.cfginstead offlang-new.cfgconfiguration files
- NOTE: The
- Added a sanity check function to invoke
compiler_name -vand check if the correct GCC installation is being detected
- For the build:
-
LLVM < 19 will still used
GCC_INSTALL_PREFIX(technically deprecated since 16 but still working)- NOTE: Tried using the same fix also for LLVM 18, but the options
--gcc-toolchainand--gcc-install-dirhave been added to flang only recently and will be included in LLVM 19 (see https://github.com/llvm/llvm-project/pull/87360)
- NOTE: Tried using the same fix also for LLVM 18, but the options
Changed runtimes_cmake_args into a list to allow modifications from multiple parts of the code (will be checked and added before each configure call.
Ensure correct python is detected also during the RUNTIMES CMake call
This python discussion overlaps with the discussion in:
- #3463
As an update everything should work now with LLVM 19.1.1 (manually tested with EC from https://github.com/easybuilders/easybuild-easyconfigs/pull/21611)
The problem related to https://github.com/llvm/llvm-project/issues/111667 should be fixed by 7151ec2 adding --unwindlib=none to CMAKE_EXE_LINKER_FLAGS during the runtimes compilation
WIP:
-
Enabling offloading increases the number of errors. Upon further inspection this is due to errors of the kind
Test report
******************** TEST 'libomptarget :: x86_64-pc-linux-gnu-LTO :: offloading/multiple_reductions_simple.c' FAILED ******************** Exit Code: 1 Command Output (stdout): -- # RUN: at line 1 /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./bin/clang -I /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm-project-19.1.1.src/offload/test -I -L /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload -L /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib -nogpulib -Wl,-rpath,/home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload -Wl,-rpath, -Wl,-rpath,/home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib -foffload-lto -fopenmp-targets=x86_64-pc-linux-gnu /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm-project-19.1.1.src/offload/test/offloading/multiple_reductions_simple.c -o /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload/test/x86_64-pc-linux-gnu-LTO/offloading/Output/multiple_reductions_simple.c.tmp /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib/libomptarget.devicertl.a && /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload/test/x86_64-pc-linux-gnu-LTO/offloading/Output/multiple_reductions_simple.c.tmp | /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./bin/FileCheck /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm-project-19.1.1.src/offload/test/offloading/multiple_reductions_simple.c # executed command: /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./bin/clang -I /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm-project-19.1.1.src/offload/test -I -L /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload -L /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib -nogpulib -Wl,-rpath,/home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload -Wl,-rpath, -Wl,-rpath,/home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib -foffload-lto -fopenmp-targets=x86_64-pc-linux-gnu /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm-project-19.1.1.src/offload/test/offloading/multiple_reductions_simple.c -o /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/runtimes/runtimes-bins/offload/test/x86_64-pc-linux-gnu-LTO/offloading/Output/multiple_reductions_simple.c.tmp /home/crivella/.local/easybuild/build/LLVM/19.1.1/GCCcore-13.3.0/llvm.obj.3/./lib/libomptarget.devicertl.a # .---command stderr------------ # | clang: error: '-fopenmp-targets' must be used in conjunction with a '-fopenmp' option compatible with offloading; e.g., '-fopenmp=libomp' or '-fopenmp=libiomp5' # `----------------------------- # error: command failed with exit status: 1Will investigate if there is some option to fix this or if it is a bug and the CMake files need a patch
-
Once the aforementioned is fixed the number of allowed failures could be pinned more precisely in an EasyConfig files
Related? https://github.com/llvm/llvm-project/issues/90333
Related? llvm/llvm-project#90333
Seems to be related, the offload CMake does not seem to be properly configuring OPENMP_TEST_OPENMP_FLAGS. Tried to manually patch the -fopenmp but then this results in a omp.h not found, so I guess I will also have to patch the location of the header during the test
@Thyre In reference to:
- https://github.com/easybuilders/easybuild-easyblocks/pull/3480#issuecomment-2410589255
Modularized functionalities related to detecting/setting/sanity_checking gcc_prefix in tre functions (https://github.com/easybuilders/easybuild-easyblocks/pull/3373/commits/997a689f041ace9b9e79a1ee6b0ac34aa237a185):
- get_gcc_prefix
- create_compiler_config_file
- sanity_check_gcc_prefix
Tested with EC from:
- https://github.com/easybuilders/easybuild-easyconfigs/pull/21611
@Micket The solution was slightly more involved but atleast on x86_64 was able to make a patch file (needs testing on other archs and gpus). Opened an issue on llvm, hope it will get picked up and maybe fix the problem at the source in a more robust way
- https://github.com/llvm/llvm-project/issues/112210
@Thyre In reference to:
* [Enhance AOCC EasyBlock to correctly pass GCC toolchain and compiler driver #3480 (comment)](https://github.com/easybuilders/easybuild-easyblocks/pull/3480#issuecomment-2410589255)Modularized functionalities related to detecting/setting/sanity_checking
gcc_prefixin tre functions (997a689):* get_gcc_prefix * create_compiler_config_file * sanity_check_gcc_prefixTested with EC from:
* [{compiler}[SYSTEM,GCCcore/13.3.0] LLVM 19.1.1 easybuild-easyconfigs#21611](https://github.com/easybuilders/easybuild-easyconfigs/pull/21611)
Great, thanks a lot! Once this PR gets merged, we should probably cleanup the AOCC EasyBlock and maybe include these checks and/or settings in the other LLVM-based compilers.
As a (very early) heads-up: LLVM 20 will include the rename of flang-new to flang (https://github.com/llvm/llvm-project/pull/110023). A symlink to flang-new still exists.
Rebased onto 5.0.x