easybuild
easybuild copied to clipboard
RFC: Implementing a proper Clang toolchain
Disclaimer: I'm by no means an LLVM or Clang expert. The information below is just a collection of bits and pieces found in various places as well as my personal thoughts on how EasyBuild support could be improved.
Target
A working LLVM-based toolchain -- at least for C/C++ -- with minimal redundancy. Here, "toolchain" is not meant in the EasyBuild sense (i.e., including an MPI, math libs, etc.), but merely refers to a compiler environment that can be used by end users to build their codes. (This doesn't rule out to have an MPI w/o Fortran support using Clang
, though.)
With proper Fortran support being on the horizon, however, it might become a full toolchain in the EasyBuild sense in the future. This should be taken into account in the design.
Status quo
LLVM / Clang / flang
LLVM provides a framework for code optimization and generation for many different target CPUs. The most prominent language frontend is Clang
, which focuses on C-like languages (C, C++, Objective-C, OpenCL). Basically all commercial compiler vendors (Intel, PGI, Cray, IBM, Fujitsu, ARM) have switched in the meanwhile to Clang
as the basis for their C/C++ compilers.
Fortran support was started based on the PGI Fortran compiler frontend, see the flang project on GitHub, now called "old/legacy/classic flang". However, it requires patched versions of LLVM and Clang, and seems stuck at LLVM 9. However, this mailing list post suggests that there might be an update for LLVM 11 ("LLVM11 with classic flang is on various vendor's roadmap for this autumn, so one of us will do it I'm sure.")
Besides, there is a "new flang" frontend (formerly called f18
) written from scratch, now developed as an official LLVM project. However, it isn't fully functional yet and still depends on another compiler to do the actual work, see this mailing list post.
EasyBuild
EasyBuild currently includes various LLVM
packages which are used as dependencies by, for example, Mesa
, numba
, and Rust
. Recent versions are built on top of GCCcore
, and only include the core LLVM libraries and tools.
In addition, there are various Clang
easyconfigs. Again, recent versions are (usually) built on top of GCCcore
. These can be used as a stand-alone compiler, but are also used as dependencies by various packages, such as pocl
, TRIQS
, and Longshot
, and could be used by additional packages such as Score-P
and Doxygen
. This is due to also providing libraries for source-code parsing and processing. The Clang
packages build their own copy of LLVM
, and include other LLVM projects such as an OpenMP runtime library, the lld
linker, the libc++
C++ Standard Library, and the polly
polyhedral optimizer, though not all of those components are used by default with the current configuration.
There has been some work on packaging "legacy flang" (see https://github.com/easybuilders/easybuild-easyconfigs/pull/8335 and https://github.com/easybuilders/easybuild-easyblocks/pull/1729), however, the question is whether it is worth putting more effort into this since things might change considerably with the "new flang".
Possible ways to organize things in EasyBuild
-
Build full
Clang
(includinglld
, libraries, etc.) using an existingLLVM
built withGCCcore
as dependency-
Pros:
- Reduces redundancy
-
Cons:
-
Building LLVM projects out-of-tree is basically undocumented. Therefore, it is unclear how projects interrelate to each other and how to configure things correctly. However, some information could be extracted from the Fedora RPM specs (e.g., for Clang).
-
The LLVM OpenMP library by default installs symlinks for
libgomp
andlibiomp5
, i.e., the OpenMP runtimes of the GCC and Intel compilers, as it implements both APIs. Thus, the order in which modules are loaded determines which runtime is found byld.so
and affects the runtime behavior of codes using OpenMP.Creating these symlinks can be disabled via a
CMake
configuration option, but doing so may lead to simultaneously using two different OpenMP runtimes if some OpenMP code compiled withClang
is linked to a library built withGCCcore
also using OpenMP. -
Likewise, enabling
libc++
by default forClang
is likely to make code incompatible with C++ libraries compiled withGCCcore
usinglibstdc++
.
-
-
Caveats:
- According to the polly documentation, it should probably be built as part of
LLVM
rather thanClang
.
- According to the polly documentation, it should probably be built as part of
-
Pros:
-
Introduce a new package named, e.g.,
LLVM-Clang
built withGCCcore
providing a fullClang
(includinglld
, libraries, etc.) and use it as a dependency for all packages that currently depend on eitherLLVM
orClang
. AClang
compiler package would then be a bundle ofGCCcore
,LLVM-Clang
, andbinutils
.-
Pros:
- Reduces redundancy
- Follows the documented way of building all LLVM projects in one go
-
Cons:
- Inherits the OpenMP runtime and
libc++
issues outlined above - The
LLVM-Clang
vs.Clang
packaging would probably cause questions similar to theGCCcore
vs.GCC
separation.
- Inherits the OpenMP runtime and
-
Pros:
-
Build minimal
Clang
(excludinglld
, libraries, OpenMP runtime) on top ofGCCcore
-- either using an existingLLVM
or as part of aLLVM-Clang
package as outlined above -- to provide the Clang libraries to packages that need it as a dependency. In addition, build a fullLLVM
/Clang
(including everything) on theSYSTEM
level as a separate toolchain.-
Pros:
- Clear separation
- Follows the documented way of building all LLVM projects in one go
- The full toolchain aspects to consider are documented
-
Cons:
- Requires duplication of everything built with
GCCcore
, as it is a completely separate toolchain.
- Requires duplication of everything built with
-
Caveats:
- Building Clang usually requires a "modern" host compiler (found Clang >=3.5 or GCC >=5.1 to be documented as a requirement), i.e., on "Enterprise" Linux distros shipping ancient compilers one needs to first build another compiler for bootstrapping
Clang
on theSYSTEM
level. (How does one properly do this? UseGCC
as a builddep rather than toolchain???) - It is unclear whether
gfortran
could (temporarily) serve as a Fortran compiler in a full LLVM toolchain using, e.g.,compiler-rt
instead oflibgcc_s
. It's very likely that this won't work. - The
Clang
module underGCCcore
serves a very limited purpose and should thus be avoided by end-users, unless they really know what they are doing. Not sure how to best prevent/document this. It is also unclear whether such a stripped downClang
would be sufficient for all packages that currently depend on the existingClang
packages.
- Building Clang usually requires a "modern" host compiler (found Clang >=3.5 or GCC >=5.1 to be documented as a requirement), i.e., on "Enterprise" Linux distros shipping ancient compilers one needs to first build another compiler for bootstrapping
-
Pros: