carbon-lang icon indicating copy to clipboard operation
carbon-lang copied to clipboard

How do we want to handle C++ runtimes in the Carbon toolchain?

Open chandlerc opened this issue 8 months ago • 0 comments
trafficstars

When the Carbon toolchain is compiling C++ code, either directly or for interop, we need to provide a complete set of C++ runtimes. This issue is to discuss various challenges that faces and how we want to address them.

Relevant background material is the draft proposal around improvements to the Carbon toolchain's compilation model, as that also deals with how we expect to handle the Carbon runtimes. I'm separating out the C++ runtime questions to an issue because a) they're more complex than Carbon runtimes (as we have a lot of legacy here), and b) this doesn't directly impact the design of Carbon very much.


First, let's cover some background on the different components of the C and C++ runtimes needed for a Clang-based toolchain to build both C and C++ code. Please let me know if I have any mistakes here, happy to update this section so we have a good baseline.

Freestanding runtimes

Some of the runtimes are "freestanding" or separate from any "host" system. These are distinct from the "hosted" parts of the runtimes.

These runtimes have a nice aspect: they have no dependencies outside of themselves and the compiler. They don't reference the host system in any interesting way. That means we can trivially compile them for any target platform supported by the compiler.

Many of these runtimes are header-only and provided by Clang's builtin headers that we just trivially install.

However, there are two components that require generated code:

  • The C runtime "begin" and "end" symbols, often crtbegin.o and crtend.o, that provide the most basic freestanding parts of the C runtime.
  • The compiler builtins library that provides special functions for operations not available in native hardware but required by the compiler itself.
    • The atomic runtimes for atomic operations on non-lockfree types, sometimes built as part of the compiler builtins, and other times as a standalone libatomic.

We get both of these from LLVM's compiler-rt project.

Hosted runtimes

Other C / C++ runtimes are "hosted" and to some extent build upon the system itself.

  • The C standard library or libc, which builds on the kernel headers and other system headers to enable syscalls, etc. We can get a partial implementation of this in llvm-libc, but currently this still requires building on top of some complete underlying libc such as glibc or musl.
  • libunwind which implements various aspects of frame unwinding. There is a good implementation of this in LLVM's libunwind. This depends on the system and libc.
  • C++ standard library, where we expect to use LLVM's libc++. This depends on the system, libc, and libunwind.
    • Technically there are two parts of the C++ standard library, but that's not relevant here.

Next, let's look at what use cases we might want to support:

  • Building C++ code using the installed LLVM and Clang toolchain symlinks as a "normal" Clang toolchain installation.
  • Building a pure C++ project using the Carbon compile and link commands suggested in the draft proposal.
  • Building a pure C++ project using the Carbon build command suggested in the draft proposal.
  • Building a mixture of C++ and Carbon with the compile and link commands, or using the build command.

The last one is somewhat the most important as that's what we need for interop. However, if we support that then I believe the second is something we will get anyways. We might not get the third (Carbon's build command with pure C++), but that seems fine.

The first one doesn't seem as important as mixtures of Carbon and C++, but seems like it might be useful. In particular, if we can ensure that C++ code built using the Carbon toolchain's symlinks is ABI compatible with C++ code mixed into a Carbon build, this would allow an easy way to move a C++ dependency's build to a supporting toolchain without having to adjust the actual build system. That seems pretty valuable if not a strict requirement.


Last, let's consider our options for how to approach building these runtimes.

  • (a) The most obvious approach is to build the runtimes just like a CMake build of Clang would and install them. Then using the clang or clang++ symlinks would "Just Work".
  • (b) A more complex approach would be to take the same design as we are imagining for the Carbon runtimes and install the runtime sources rather than pre-built artifacts. We could then embed into the Carbon toolchain knowledge of how to build these runtimes on demand rather than attempting to pre-build them.
  • (c) Some hybrid of (a) and (b) where we prebuild some runtimes, but not others. There are various permutations on this kind of hybrid.

Let's look at the tradeoffs for these.

In some ways (a) is very straightforward. We look at how LLVM's CMake build produces runtimes, and we replicate that with the just-built compiler in Bazel. I have a PR that does this for everything but libc++ and llvm-libc for at least a couple of platforms.

However, (a) also has a big drawback: there is no easy way to cross-compile the hosted runtimes for all the targets we will want to support. We would need to do something pretty elaborate like use Bazel to download a docker image (or something similar) of each relevant target to serve as a system root, and build the hosted runtimes against that system root. And it isn't clear we can even do that for either macOS or Windows.

This is where (b) really shines: we can build the hosted runtimes on-demand for exactly the target environment desired. And if we have the logic to build the hosted runtimes, building the freestanding runtimes is trivial. We also get other advantages, for example avoiding the need to fully enumerate all the ABIs we can target, which tends to be a cross product of the architectures (64-bit x86 and Arm at a minimum), OSes (Linux, macOS, and Windows at a minimum), any ABI-impacting sanitizers (ASan, TSan, MSan at a miimum), and any ABI-impacting features. This can easily reach a large number -- we're at 2 * 3 * 3 or 18 at a minimum. =[ It also makes it trivial to roll out a new ABI-impacting feature, enable optimizations for a specific micro-architecture, etc.

But there are downsides of (b). First and unavoidably, this introduces significant complexity. We'll have to embed into the toolchain a mini build system to produce all of these runtimes. These runtimes are also large enough that it will put more instantaneous pressure on having a caching scheme that ensures we don't have to do this for every link.

Another complication with (b) is what we do when the clang or clang++ symlink is directly invoked instead of going through the new carbon link command? If we don't intercept those and do this on-demand runtime build, then we wouldn't be able to fully match the ABIs between C++ code built with those symlinks and with the new carbon subcommands. But doing that interception is likely to be its own reasonable set of complexity:

  • We'll have to parse the Clang command line to figure out whether we're just compiling or performing a link.
  • Then will have to see if any of our on-demand built runtimes were selected by the command lines that do perform a link.
  • Will also have to scrape the command line for compile flags that should be passed along to the runtime comppiles.
  • And then inject our just-built runtimes into the correct part of the commandline so that they get linked in the expected ways.

Last but not least, we could pursue a hybrid (c) solution. Specifically, we can easily pre-build the freestanding runtimes, and then only build the hosted runtimes on-demand. On one hand, this somewhat seems to have the worst of both worlds, as we need some of the complexity of building N permutations on the builtins, and we still need the complexity of building some runtimes on demand. On the other hand, the freestanding builtins carry some surprising complexity that does make pre-building them somewhat more appealing than building on demand. For example, building them on-demand would require on-demand handling of assembly files.


With all that said, here is the direction I'm leaning.

Fundamentally, I think we really want to keep the API compatibility between direct use of the clang++ command line and the main carbon compile and carbon link. So despite the complexity, I somewhat think we should tackle (b). And don't think the simplifications of (c) are worth the cost of having two ways to do things.

This also has some interesting advantages for debugging, etc -- the sources are definitely available, and can even be editted if needed.

So this is the direction I'm leaning but wanted to post this issue to see if I'm just going in a really bad direction or not...

chandlerc avatar Mar 09 '25 02:03 chandlerc