toolchains_llvm icon indicating copy to clipboard operation
toolchains_llvm copied to clipboard

Leverage LLVM bazel support

Open sluongng opened this issue 3 years ago • 10 comments

Google folks have contributed Bazel support for LLVM in the llvm-project tree.

The full configuration could be found here https://github.com/llvm/llvm-project/tree/main/utils/bazel and sample usages could be found here https://github.com/llvm/llvm-project/blob/main/utils/bazel/examples/http_archive/WORKSPACE

It seems like most of what we need to configure LLVM toolchain are available as bazel targets

> cd llvm-project/utils/bazel/examples/http_archive

> bazel query '@llvm-project//...' 2>/dev/null | grep -E ':(clang|clang\+\+|llvm-nm|lld)$'
@llvm-project//llvm:llvm-nm
@llvm-project//lld:lld
@llvm-project//clang:clang++
@llvm-project//clang:clang

Here is the full list of binaries exposed

> bazel query 'kind("cc_binary", @llvm-project//...)'
@llvm-project//mlir:tools/libvulkan-runtime-wrappers.so
@llvm-project//mlir:mlir-vulkan-runner
@llvm-project//mlir:mlir-translate
@llvm-project//mlir:mlir-tblgen
@llvm-project//mlir:mlir-spirv-cpu-runner
@llvm-project//mlir:mlir-reduce
@llvm-project//mlir:mlir-pdll
@llvm-project//mlir:mlir-opt
@llvm-project//mlir:mlir-lsp-server
@llvm-project//mlir:mlir-linalg-ods-yaml-gen
@llvm-project//mlir:mlir-cpu-runner
@llvm-project//mlir:_mlirTransforms.so
@llvm-project//mlir:_mlirLinalgPasses.so
@llvm-project//mlir:_mlirExecutionEngine.so
@llvm-project//mlir:_mlirConversions.so
@llvm-project//mlir:_mlir.so
@llvm-project//llvm:yaml2obj
@llvm-project//llvm:yaml-bench
@llvm-project//llvm:verify-uselistorder
@llvm-project//llvm:split-file
@llvm-project//llvm:sanstats
@llvm-project//llvm:sancov
@llvm-project//llvm:opt
@llvm-project//llvm:obj2yaml
@llvm-project//llvm:not
@llvm-project//llvm:llvm-xray
@llvm-project//llvm:llvm-undname
@llvm-project//llvm:llvm-tli-checker
@llvm-project//llvm:llvm-tblgen
@llvm-project//llvm:llvm-tapi-diff
@llvm-project//llvm:llvm-symbolizer
@llvm-project//llvm:llvm-strings
@llvm-project//llvm:llvm-stress
@llvm-project//llvm:llvm-split
@llvm-project//llvm:llvm-size
@llvm-project//llvm:llvm-rtdyld
@llvm-project//llvm:llvm-reduce
@llvm-project//llvm:llvm-readobj
@llvm-project//llvm:llvm-rc
@llvm-project//llvm:llvm-profgen
@llvm-project//llvm:llvm-profdata
@llvm-project//llvm:llvm-pdbutil
@llvm-project//llvm:llvm-opt-report
@llvm-project//llvm:llvm-opt-fuzzer
@llvm-project//llvm:llvm-objdump
@llvm-project//llvm:llvm-objcopy
@llvm-project//llvm:llvm-nm
@llvm-project//llvm:llvm-mt
@llvm-project//llvm:llvm-modextract
@llvm-project//llvm:llvm-ml
@llvm-project//llvm:llvm-mca
@llvm-project//llvm:llvm-mc
@llvm-project//llvm:llvm-lto2
@llvm-project//llvm:llvm-lto
@llvm-project//llvm:llvm-lipo
@llvm-project//llvm:llvm-link
@llvm-project//llvm:llvm-libtool-darwin
@llvm-project//llvm:llvm-jitlink
@llvm-project//llvm:llvm-isel-fuzzer
@llvm-project//llvm:llvm-ifs
@llvm-project//llvm:llvm-extract
@llvm-project//llvm:llvm-exegesis
@llvm-project//llvm:llvm-dwp
@llvm-project//llvm:llvm-dwarfdump
@llvm-project//llvm:llvm-dis
@llvm-project//llvm:llvm-diff
@llvm-project//llvm:llvm-cxxmap
@llvm-project//llvm:llvm-cxxfilt
@llvm-project//llvm:llvm-cxxdump
@llvm-project//llvm:llvm-cvtres
@llvm-project//llvm:llvm-cov
@llvm-project//llvm:llvm-cfi-verify
@llvm-project//llvm:llvm-cat
@llvm-project//llvm:llvm-c-test
@llvm-project//llvm:llvm-bcanalyzer
@llvm-project//llvm:llvm-as
@llvm-project//llvm:llvm-ar
@llvm-project//llvm:lli-child-target
@llvm-project//llvm:lli
@llvm-project//llvm:llc
@llvm-project//llvm:dsymutil
@llvm-project//llvm:count
@llvm-project//llvm:bugpoint
@llvm-project//llvm:FileCheck
@llvm-project//lld:lld
@llvm-project//clang:diagtool
@llvm-project//clang:clang-tblgen
@llvm-project//clang:clang-scan-deps
@llvm-project//clang:clang-repl
@llvm-project//clang:clang-rename
@llvm-project//clang:clang-refactor
@llvm-project//clang:clang-offload-wrapper
@llvm-project//clang:clang-offload-bundler
@llvm-project//clang:clang-import-test
@llvm-project//clang:clang-format
@llvm-project//clang:clang-extdef-mapping
@llvm-project//clang:clang-diff
@llvm-project//clang:clang-check
@llvm-project//clang:clang
@llvm-project//clang:c-index-test
@llvm-project//clang:c-arcmt-test
@llvm-project//clang:libclang.so
@llvm-project//clang:libclang.dylib
@llvm-project//clang:libclang.dll
@llvm-project//clang:arcmt-test

sluongng avatar Dec 29 '21 14:12 sluongng

Hi! I'm aware of the Bazel overlay that's been merged into LLVM proper; unfortunately I don't think it's straight-forward to make use of it in this toolchain (though having this repository have an option to build clang and friends from source instead of using prebuilt toolchains is definitely desirable for host platform support reasons).

When trying to have these kind of "bootstrapped" toolchain setups (i.e. a cc_toolchain that depends on cc_binary targets that themself require a cc_toolchain) you quickly run into dependency cycle errors (you cannot build a cc_binary using a toolchain that depends on that same cc_binary, etc.). Toolchain transitions are probably the best way to get around this (here's an example for this exact use case) but these do not work for native rulesets like rules_cc. There are other ways to get around this but I don't know of any that don't have some pretty major caveats.

Until we have an official starlark-only rules_cc implementation I think there isn't a "good" way to do this sort of thing (however, it's unclear what the status of that effort is).

Creating a minimal starlark rules_cc clone that has just enough functionality to, for example, build LLVM and then "applying" that ruleset to only repos used to create the actual toolchain (using repo_mapping) does seem like a viable way to sidestep the native ruleset toolchain transition issue. However, it's a fair amount of work and I think it's properly out of scope for this project. It is something I want for different reasons, though – I'm happy to chat about it if you're interested.


Alternatively, is there a specific reason you want this kind of functionality? As mentioned I don't think it's straight-forward to modify this project to build the toolchain it registers from source but if you want that sort of thing for the purpose of supporting a particular platform or a particular LLVM version or some other similar use case, we could definitely look at just supporting that use case.

rrbutani avatar Dec 29 '21 18:12 rrbutani

@rrbutani you raised a really good point 🤔

I was trying to get around the need of us having to patch and re-package LLVM internally recently. The repackaging of LLVM into a tar.zx bundle that is consumable by grailbio/bazel-toolchain is using CMake and takes hours to complete. I was hoping that if we could just leverage bazel to build and cache the LLVM build, why not just do it in the same project.

Creating a llvm_cc_binary wrapper would be a PITA to maintain in the long run. Perhaps I will consider exploring the option to contribute to LLVM's utils/bazel to add a BUILD target that would bundle everything into a tar.zx file. So that we can still leverage our Bazel infra structure to package LLVM.

This will be beneficial to have considering that (1) we are currently monkey patching LLVM and (2) LLVM project does not provide a distribution for Darwin arm64(aarch64), a future requirement of ours.

sluongng avatar Dec 31 '21 12:12 sluongng

Creating a llvm_cc_binary wrapper would be a PITA to maintain in the long run.

As mentioned this is something I'm interested in trying to do (a minimal starlark-only rules_cc equivalent specifically for these kinds of use cases that is) but I agree that it's not worth creating and maintaining specifically for this. I think it is something I'll eventually try to put together, probably not in the near future though.


Perhaps I will consider exploring the option to contribute to LLVM's utils/bazel to add a BUILD target that would bundle everything into a tar.zx file. So that we can still leverage our Bazel infra structure to package LLVM.

This will be beneficial to have considering that (1) we are currently monkey patching LLVM and (2) LLVM project does not provide a distribution for Darwin arm64(aarch64), a future requirement of ours.

This sounds great; I'm interested in having something like this for similar reasons (we're also interested in arm64 Darwin support (#88) as well as supporting other host platforms and configurations for which there aren't really suitable official LLVM distributions). Having nice bazel targets that produce the tarballs isn't a necessity for us (for this project we'd just be looking to build things on upstream releases; every 6 months excluding patch releases, definitely in the realm of what can be done manually without artifact caching) but it'd definitely make things much nicer. I'm happy to help with this in any way that'd be useful.

rrbutani avatar Dec 31 '21 13:12 rrbutani

@sluongng I took another look at this today; I'm fairly confident that this (which made it into Bazel 5.0) makes it so that we can toolchain transition across @rules_cc rules now (I think this was possible in prior versions with --incompatible_override_toolchain_transition; bazelbuild/bazel#11584 has more context).

I'm going to try to put together a small example to confirm.

rrbutani avatar Jan 22 '22 22:01 rrbutani

Okay yeah, seems to work!

rrbutani avatar Jan 23 '22 00:01 rrbutani

We'd just need to offer a way for the repo rules in this package to take labels to binaries/filegroups instead of just repositories and archive URLs.

From there we can add the necessary plumbing to turn this into a top-level option if we want (handling the bootstrapping, etc.; maybe we can grow a source attr on llvm_toolchain with some helpers for the various sources we now support).


I'm probably not going to get to this until at least the end of the month or so; I want to resolve some of the other issues first. But definitely feel free to get started on this if you want.

rrbutani avatar Jan 23 '22 01:01 rrbutani

@rrbutani @sluongng rules_ll may be interesting for you. We use the bazel overlay and added some overlays for libcxx, compiler-rt, libcxxabi and libunwind to construct a mostly encapsulated toolchain.

We are missing libc for now though 😇

neqochan avatar Feb 09 '22 13:02 neqochan

@neqochan Thanks!

I think we'd probably want to take a slightly different approach for bazel-toolchain that makes use of rules_cc and toolchain transitions but rules_ll is very neat regardless.


I have not been actively working on this yet but ^ raises some very good questions. In particular, I hadn't considered what we'd do for libc (afaik LLVM libc is not yet ready for use; I think we'd need to fall back to using libc on the host both for producing the LLVM toolchain and in the binaries it produces) or how we'd bootstrap libc++ and friends (we'd want to build them with the clang/etc that are built and then produce a new toolchain that references them; doing this within the rules_cc system is definitely cumbersome at best).


It's out of scope for a first pass at this but eventually I'd like to offer some way to build actually hermetic toolchains this way (i.e. statically linked against musl and using musl or another bundled non-host libc for the binaries it creates) – both for immediate use in the workspace they're built in (the use case this issue describes) and for export (to be fetched and used the way we currently use the official LLVM releases – this'd enable us to support platforms that the official LLVM releases don't (like NixOS) and provide a hermetic offering). (it's also mildly concerning that the bootstrap chain for this kind of setup would be something like: [host] -> [host with musl] -> [clang] -> [clang with musl] -> libunwind/libcxx/libcxxabi/compiler-rt -> [final toolchain])

Regardless I think a good first step is making the changes to bazel-toolchain that allow us to construct a toolchain from clang/libcxx/compiler_rt/libc/etc labels and unifying that with the current flows that take a binary distribution repo, if possible.

rrbutani avatar Feb 09 '22 17:02 rrbutani

From an end use perspective it may actually be more desirable to have something like a "simple" [binary clang/llvm fetched] -> [libcxx etc] -> [final toolchain that uses fetched clang with host libc, but links custom libc, libcxx etc]. The build times for libcxx etc. are completely acceptable, but the build time for Clang itself may be off-putting for many users.

In bazel-toolchain it may make sense to use rules_foreign_cc to use the native CMake build system for libcxx etc instead of creating overlays.

neqochan avatar Feb 10 '22 15:02 neqochan

From an end use perspective it may actually be more desirable to have something like a "simple" [binary clang/llvm fetched] -> [libcxx etc] -> [final toolchain that uses fetched clang with host libc, but links custom libc, libcxx etc]. The build times for libcxx etc. are completely acceptable, but the build time for Clang itself may be off-putting for many users.

I think building compiler or any of the runtime components will be kind of a last resort for users; only for people using platforms we don't support and have binary distributions for.

I think it's still worth having a flow that builds from source to support use cases like the one described by the OP (patches to LLVM, active compiler development, etc.).

In bazel-toolchain it may make sense to use rules_foreign_cc to use the native CMake build system for libcxx etc instead of creating overlays.

This is definitely a good suggestion and something we should definitely look into if building libcxx proves to be onerous but I'd rather avoid this if possible; it's nice having the whole build graph actually be in Bazel proper.

rrbutani avatar Feb 10 '22 21:02 rrbutani

@sluongng See https://github.com/dzbarsky/static-clang. Unfortunately I think the upstream build rules are not fully cacheable, I keep getting misses on my GHA build runners that are using buildbuddy RBE. So I think building from source and relying on caching would be painful, better to use prebuilt tarball like the ones generated by that repo.

dzbarsky avatar Oct 04 '23 20:10 dzbarsky

@dzbarsky static-clang is a really cool repo! I gotta play around with that a bit :blush: I ran the LLVM builds on buildbuddy as well, and at least for the clang target I think that they are fully cacheable. A while ago there were a few external deps leaking into the build (zlib, terminfo), but I think I fixed those leaks in the LLVM overlay.

I ran into similar issues as you describe though. For some reason the GHA env seems to not work. Something I haven't tried for GHA, but that I'm using for other setups is using the stdenv from nix. Using that as the env for a GHA run might help (but probably requires a few manual tweaks). Maybe some .bazelrc flags could also help make things more robust?

aaronmondal avatar Oct 06 '23 01:10 aaronmondal

@aaronmondal Do you have an example of the nix setup? I was thinking I could also run the build from inside a docker container on GHA, that might isolate things enough to get it reproducible. I thought I got the bazelrc flags that are important for hermeticity but it looks like at least BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 should be set on repo_env instead, maybe I missed something else...

dzbarsky avatar Oct 06 '23 01:10 dzbarsky

@dzbarsky The setup I use is rougly this:

  • Create a bazel wrapper that contains all the toolchains https://github.com/eomii/rules_ll/blob/main/bazel-wrapper/default.nix

  • Take that wrapper and build a container from it so that bazel-remote can use that container to generate RBE configs: https://github.com/eomii/rules_ll/blob/main/rbe (I'm just now noticing that the docs are outdated though as this stuff is now intended to run in k8s pipelines which uses slightly different container setups). Basically, create a container with the host toolchain of your choice, run the bazel-remote tool, and then make the resulting toolchain visible to your rules: https://github.com/eomii/rules_ll/blob/08abe7e7b8a0166e35ebce61fb722717b732d02b/MODULE.bazel#L10-L11

  • The wrapper itself is here: https://github.com/eomii/rules_ll/blob/main/bazel-wrapper/default.nix Note that you can ignore all the LL_* variables in that file. Those are used for other rules. Regular rules_cc just cares about the CC and BAZEL_* variables.

  • The bazelrc is here: https://github.com/eomii/rules_ll/blob/main/.bazelrc

  • I changed a bunch of this stuff over time and am not running against buildbuddy anymore, so things might require some tweaks (and in general I should really update that repo :sweat_smile:) The specific commit that I used to test against buildbuddy was this: https://github.com/eomii/rules_ll/commit/75825f027b36fe411508e16c1b39368a715faa21 (that commit worked, but if I remember correctly this still had some issues, so stable state is somewhere between that commit and https://github.com/eomii/rules_ll/commit/878ef84f6016b5325bd9b095550c8ea35b0c8669

All of this was tested on several machines, manually configured against buildbuddy, and we had (except for libcxx modules, which are not part of the standard LLVM Bazel overlay) perfect cache hit rate across all x86_64-linux machines.

aaronmondal avatar Oct 06 '23 02:10 aaronmondal

Oh one thing to note though: The nix environments are nicely reproducible, but that also means that they contain everything you need during the build. I believe the original Image I had for those tests was ~20GB, and apparently the images in the current setup are closer to ~80GB as they now include cuda and several different compiler setups.

This means that this approach isn't really feasible for hosted runners. I'd say this mosly just makes sense when you have a self-hosted setup where the image build pipelines are part of your internal CI.

aaronmondal avatar Oct 06 '23 02:10 aaronmondal

If y'all suspect leaks while building with BuildBuddy RBE, feel free to throw the invocation URL my way (here or in BazelBuild Slack or BuildBuddy Slack). I could take a look.

The neat part about using BuildBuddy RBE is that your UI should show you the input roots of actions that were executed remotely (in "Executions" tab, click on an action). For example: image.

So you could compare 2 actions from 2 different invocation to identify what is causing these leaks.

We also record all cache traffics, in and out, so that if you do 2 subsequent clean builds on the same commits, any write cache traffics on the 2nd build must be leaks(!). image

I should write these into a blog post 😅 but in the meantime, let me know if you want me to help take a look.

sluongng avatar Oct 06 '23 11:10 sluongng

@sluongng Yeah I just haven't had time to look into it yet. But I kicked off a few builds to collect the data we need, and am going to poke around now. If you do have some cycles and want to look, I wouldn't say no :)

I added a dummy --action_env var to bust the cache, then kicked off 3 builds, waiting for the previous one to complete

Runner build 1 (empty cache): https://app.buildbuddy.io/invocation/a5158393-8883-43de-9f35-069fe525e032# Runner build 2 (fully cached): https://app.buildbuddy.io/invocation/c5a81371-19f3-4af7-ad1a-d37775f68adf Runner build 3 (cache misses): https://app.buildbuddy.io/invocation/2e7704f5-ae71-466e-9078-47dddf8fef9e

dzbarsky avatar Oct 06 '23 15:10 dzbarsky

Ah, looks like an issue with the Zig toolchain https://github.com/uber/hermetic_cc_toolchain/issues/89#issuecomment-1750995767

dzbarsky avatar Oct 06 '23 16:10 dzbarsky

I am glad that you were able to put our UI to use 💪 If you are willing to share a screenshot(s) of what you compared, it would help other folks follow along as well.

And feel free to send any feedbacks my ways 🤗

sluongng avatar Oct 06 '23 16:10 sluongng

@sluongng Sure. I knew that zlib was one of the first actions topologically, so I took a look at a compilation across the builds. Here's what I saw:

image

I just traced down the input tree following the mistmatched digests until I found the zig_sdk/tools/x86_64-macos-none/c++ input.

Now here's the fun part; this is created by a repository rule, so it's not running in RBE if I understand correctly.

fakeuser@ef10b5617ca6:/static-clang$ ls -l bazel-static-clang/external/zig_sdk/tools/x86_64-macos-none/c++ 
lrwxrwxrwx 1 fakeuser fakeuser 111 Oct  3 21:17 bazel-static-clang/external/zig_sdk/tools/x86_64-macos-none/c++ -> /home/fakeuser/.cache/bazel/_bazel_fakeuser/984746b1675bc799f884f49edfc61398/external/zig_sdk/tools/zig-wrapper

The compilation happens here: https://github.com/uber/hermetic_cc_toolchain/blob/c1860b7d1d62a095020196090fb5d540f8a31388/toolchain/defs.bzl#L161-L208

Any ideas where to go from here?

dzbarsky avatar Oct 06 '23 16:10 dzbarsky

I would download the 2 binaries and try to diff them (or hex dump and diff).

You should be able to tell what's inconsistent between the 2.

Funny enough, I am also looking at that very code path today. I wonder if you are retaining /tmp between builds and the zig cache from previous build is affecting subsequent builds.

sluongng avatar Oct 06 '23 16:10 sluongng

Got to the bottom of it - https://github.com/uber/hermetic_cc_toolchain/pull/124

dzbarsky avatar Oct 07 '23 00:10 dzbarsky

There is not much for us to do here. People can already use Bazel packages as toolchain roots, as long as some convention around target names is followed.

siddharthab avatar Mar 14 '24 22:03 siddharthab