easybuild icon indicating copy to clipboard operation
easybuild copied to clipboard

RFC: Implementing a proper Clang toolchain

Open geimer opened this issue 4 years ago • 11 comments

Disclaimer: I'm by no means an LLVM or Clang expert. The information below is just a collection of bits and pieces found in various places as well as my personal thoughts on how EasyBuild support could be improved.


Target

A working LLVM-based toolchain -- at least for C/C++ -- with minimal redundancy. Here, "toolchain" is not meant in the EasyBuild sense (i.e., including an MPI, math libs, etc.), but merely refers to a compiler environment that can be used by end users to build their codes. (This doesn't rule out to have an MPI w/o Fortran support using Clang, though.)

With proper Fortran support being on the horizon, however, it might become a full toolchain in the EasyBuild sense in the future. This should be taken into account in the design.

Status quo

LLVM / Clang / flang

LLVM provides a framework for code optimization and generation for many different target CPUs. The most prominent language frontend is Clang, which focuses on C-like languages (C, C++, Objective-C, OpenCL). Basically all commercial compiler vendors (Intel, PGI, Cray, IBM, Fujitsu, ARM) have switched in the meanwhile to Clang as the basis for their C/C++ compilers.

Fortran support was started based on the PGI Fortran compiler frontend, see the flang project on GitHub, now called "old/legacy/classic flang". However, it requires patched versions of LLVM and Clang, and seems stuck at LLVM 9. However, this mailing list post suggests that there might be an update for LLVM 11 ("LLVM11 with classic flang is on various vendor's roadmap for this autumn, so one of us will do it I'm sure.")

Besides, there is a "new flang" frontend (formerly called f18) written from scratch, now developed as an official LLVM project. However, it isn't fully functional yet and still depends on another compiler to do the actual work, see this mailing list post.

EasyBuild

EasyBuild currently includes various LLVM packages which are used as dependencies by, for example, Mesa, numba, and Rust. Recent versions are built on top of GCCcore, and only include the core LLVM libraries and tools.

In addition, there are various Clang easyconfigs. Again, recent versions are (usually) built on top of GCCcore. These can be used as a stand-alone compiler, but are also used as dependencies by various packages, such as pocl, TRIQS, and Longshot, and could be used by additional packages such as Score-P and Doxygen. This is due to also providing libraries for source-code parsing and processing. The Clang packages build their own copy of LLVM, and include other LLVM projects such as an OpenMP runtime library, the lld linker, the libc++ C++ Standard Library, and the polly polyhedral optimizer, though not all of those components are used by default with the current configuration.

There has been some work on packaging "legacy flang" (see https://github.com/easybuilders/easybuild-easyconfigs/pull/8335 and https://github.com/easybuilders/easybuild-easyblocks/pull/1729), however, the question is whether it is worth putting more effort into this since things might change considerably with the "new flang".

Possible ways to organize things in EasyBuild

  1. Build full Clang (including lld, libraries, etc.) using an existing LLVM built with GCCcore as dependency

    • Pros:
      • Reduces redundancy
    • Cons:
      • Building LLVM projects out-of-tree is basically undocumented. Therefore, it is unclear how projects interrelate to each other and how to configure things correctly. However, some information could be extracted from the Fedora RPM specs (e.g., for Clang).

      • The LLVM OpenMP library by default installs symlinks for libgomp and libiomp5, i.e., the OpenMP runtimes of the GCC and Intel compilers, as it implements both APIs. Thus, the order in which modules are loaded determines which runtime is found by ld.so and affects the runtime behavior of codes using OpenMP.

        Creating these symlinks can be disabled via a CMake configuration option, but doing so may lead to simultaneously using two different OpenMP runtimes if some OpenMP code compiled with Clang is linked to a library built with GCCcore also using OpenMP.

      • Likewise, enabling libc++ by default for Clang is likely to make code incompatible with C++ libraries compiled with GCCcore using libstdc++.

    • Caveats:
      • According to the polly documentation, it should probably be built as part of LLVM rather than Clang.
  2. Introduce a new package named, e.g., LLVM-Clang built with GCCcore providing a full Clang (including lld, libraries, etc.) and use it as a dependency for all packages that currently depend on either LLVM or Clang. A Clang compiler package would then be a bundle of GCCcore, LLVM-Clang, and binutils.

    • Pros:
      • Reduces redundancy
      • Follows the documented way of building all LLVM projects in one go
    • Cons:
      • Inherits the OpenMP runtime and libc++ issues outlined above
      • The LLVM-Clang vs. Clang packaging would probably cause questions similar to the GCCcore vs. GCC separation.
  3. Build minimal Clang (excluding lld, libraries, OpenMP runtime) on top of GCCcore -- either using an existing LLVM or as part of a LLVM-Clang package as outlined above -- to provide the Clang libraries to packages that need it as a dependency. In addition, build a full LLVM/Clang (including everything) on the SYSTEM level as a separate toolchain.

    • Pros:
      • Clear separation
      • Follows the documented way of building all LLVM projects in one go
      • The full toolchain aspects to consider are documented
    • Cons:
      • Requires duplication of everything built with GCCcore, as it is a completely separate toolchain.
    • Caveats:
      • Building Clang usually requires a "modern" host compiler (found Clang >=3.5 or GCC >=5.1 to be documented as a requirement), i.e., on "Enterprise" Linux distros shipping ancient compilers one needs to first build another compiler for bootstrapping Clang on the SYSTEM level. (How does one properly do this? Use GCC as a builddep rather than toolchain???)
      • It is unclear whether gfortran could (temporarily) serve as a Fortran compiler in a full LLVM toolchain using, e.g., compiler-rt instead of libgcc_s. It's very likely that this won't work.
      • The Clang module under GCCcore serves a very limited purpose and should thus be avoided by end-users, unless they really know what they are doing. Not sure how to best prevent/document this. It is also unclear whether such a stripped down Clang would be sufficient for all packages that currently depend on the existing Clang packages.

geimer avatar Jul 27 '20 14:07 geimer