toolchains_llvm icon indicating copy to clipboard operation
toolchains_llvm copied to clipboard

Switch to the new C++ toolchains API

Open AustinSchuh opened this issue 1 year ago • 19 comments

The new BUILD file based C++ toolchains promise to be much easier to maintain.

https://github.com/bazelbuild/rules_cc/tree/main/examples/rule_based_toolchain

What do you all think about switching over to using those? I'm happy to help.

The example makes it look like we can reuse pieces from rules_cc.

cc_toolchain(                                                                                                                           
    name = "host_clang",                                                                                                           
    args = select({                                                                                                                              
        "@platforms//os:linux": [                                                                                                                            
            "//toolchain/args:linux_sysroot",                                                                             
        ],                                                                                                                                         
        "//conditions:default": [],    
    }) + [                                                                                                                                                       
        "//toolchain/args:no_canonical_prefixes",                                                                                                   
        "//toolchain/args:warnings",                                                                                  
    ],                                                                    
    enabled_features = ["@rules_cc//cc/toolchains/args:experimental_replace_legacy_action_config_features"],                                                                                                  
    known_features = ["@rules_cc//cc/toolchains/args:experimental_replace_legacy_action_config_features"],    
    tool_map = "//toolchain/tools:all_tools",                                                                       
)                                                                                                                                                            
                                                                                                                 
toolchain(                                                                                                                         
    name = "host_cc_toolchain",                                                                                                   
    toolchain = ":host_clang",                                                                                                                                    
    toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",                                                                                
)  

AustinSchuh avatar Jan 17 '25 22:01 AustinSchuh

note that this repo mostly just doesn't use any of the old stuff for this either since it just shells out to the default toolchain setup code. https://github.com/bazel-contrib/toolchains_llvm/blob/bb3f7d0171129e8f168f51f3deebdbe010e2bf00/toolchain/cc_toolchain_config.bzl#L368

keith avatar Jan 17 '25 23:01 keith

Yup. I'm trying to upstream a bunch of fixes we've built up over the years that let us cross compile against sysroots. I'm quickly finding all the places where new flags need to be injected in very specific spots through unix_cc_toolchain_config which don't exist.

AustinSchuh avatar Jan 17 '25 23:01 AustinSchuh

I think this will enable use of rule-based sysroots. Currently toolchains_llvm seems to only set correct paths for http_archive or for in-source filegroup, but not for genrule outputs or other ways to produce sysroots. While new API in rules_cc looks more promising handling generated sysroots (Note: there's still a bit of hacking needed to have a rule that produces multiple files, just producing sysroot archive is much more simpler but unusable as sysroot directly)

dmivankov avatar Mar 24 '25 10:03 dmivankov

Keith and I (more Keith than me) sent some changes upstream to rules_cc to get things closer to working there, but most of those changes haven't landed yet. I think it is clear from playing with the new API that it isn't fully debugged and tested.

AustinSchuh avatar Mar 25 '25 04:03 AustinSchuh

FWIW I rolled out a toolchain internally using the new stuff with a custom sysroot and vendored clang and it's working well. But it does still require some patches on rules_cc

keith avatar Mar 25 '25 04:03 keith

@keith, are the patches needed just the ones you've sent out for review on rules_cc? Or are there more?

AustinSchuh avatar Mar 25 '25 04:03 AustinSchuh

All submitted, no private ones. But I do have a significant amount of internal setup that isn't clear how it should fit upstream yet. For example sanitizer flags etc.

keith avatar Mar 25 '25 04:03 keith

Would you be able to share current patch to toolchains_llvm that sets up new API (to give it a try ahead of rules_cc accepting the changes)?

dmivankov avatar Mar 25 '25 07:03 dmivankov

I'm not using toolchains_llvm at all. since I don't need the conditional setup of versions of llvm or the other things it does when im vendoring exactly what i need.

keith avatar Mar 25 '25 17:03 keith

That works too, I'm just not quite sure how much setup is needed to get to a "simple hermetic toolchain" setup using new APIs, bare-bones pointing tools to compiler binaries seems to miss many default or required flags and also differs noticeably from toolchains_llvm (it's nice that aquery can be used to discover differences but a proven starting point would be nice too 🙂 )

dmivankov avatar Mar 25 '25 19:03 dmivankov

If you want a proven starting point, https://github.com/RealtimeRoboticsGroup/aos/tree/main/third_party/bazel-toolchain is mine. It supports x86 debian, and a NVIDIA Orin yocto. It is based off an old version of toolchains_llvm. I'm slowly trying to get the patches upstreamed to stop maintaining a fork, but keep running into gaps in what is there in rules_cc.

AustinSchuh avatar Mar 25 '25 19:03 AustinSchuh

there is also https://github.com/bazelbuild/rules_cc/blob/main/examples/rule_based_toolchain/toolchains/ but it's pretty bare bones

keith avatar Mar 25 '25 19:03 keith

there's also https://github.com/google/pigweed/tree/main/pw_toolchain/cc but it's a bit interesting since much of it is targeting embedded things

keith avatar Mar 25 '25 19:03 keith

This repo was mostly written following the https://bazel.build/tutorials/ccp-toolchain-config. New rule based toolchains seem much cleaner. I'd be also happy to give it a try.

thesayyn avatar Apr 01 '25 15:04 thesayyn

I think this will enable use of rule-based sysroots

I experimented with rule based toolchains a bit, unfortunately it does not help much performance since sysroot is expected to be extract during loading phase, which is not so different than traditional cc toolchains. I filed https://github.com/bazelbuild/bazel-skylib/issues/566 to see what can be done there.

thesayyn avatar Apr 02 '25 18:04 thesayyn

I think this will enable use of rule-based sysroots

I experimented with rule based toolchains a bit, unfortunately it does not help much performance since sysroot is expected to be extract during loading phase, which is not so different than traditional cc toolchains. I filed bazelbuild/bazel-skylib#566 to see what can be done there.

I haven't reached a fully working setup yet, but so far it looks workable with approximately

load("@rules_cc//cc/toolchains/args:sysroot.bzl", "cc_sysroot")
load("@rules_cc//cc/toolchains:toolchain.bzl", "cc_toolchain")
load("@rules_cc//cc/toolchains:tool_map.bzl", "cc_tool_map")

genrule(
  name = "sysroot.extract",
  srcs = [targets_producing_tarballs],
  outs = [magically_generated_listing_of_sysroot],
  cmd = ...just untar...
)

filegroup(
  name = "sysroot",
  srcs = [magically_generated_listing_of_sysroot]
)

cc_sysroot(
    name = "linux_sysroot",
    data = [":sysroot"],
    sysroot = ":sysroot",
)

cc_tool_map(
  name = "toolmap",
  tools = {
    "@rules_cc//cc/toolchains/actions:c_compile": "@@toolchains_llvm++llvm+llvm_toolchain_linux_exec_llvm//:bin/clang",
    "@rules_cc//cc/toolchains/actions:cpp_compile": "@@toolchains_llvm++llvm+llvm_toolchain_linux_exec_llvm//:bin/clang++",
  }
)

cc_toolchain(
    name = "cctoolchain",
    args = [":linux_sysroot"],
    tool_map = ":toolmap",
)

Will post back once I have it fully working on some codebase, it lacks some essential flags in the shape above to make all includes resolve properly

dmivankov avatar Apr 11 '25 09:04 dmivankov

I have began the work to migrate toolchains_llvm to rules based cc toolchain, will make a PR early next week.

thesayyn avatar Apr 11 '25 18:04 thesayyn

haven't reached a fully working setup yet, but so far it looks workable with approximately

This still does not work as you need to know the list of files in each directory and declare them with declare_file which defeats the purpose of using a treeartifact since it'll be still slow, maybe not as slow if the sysroot got extracted during repository rule loading.

thesayyn avatar Apr 11 '25 18:04 thesayyn

I have made a good progress on this one. There are some blockers that i'd like to get out of the way before this becomes a reality. this repo also needs a bit heavy refactoring to allow use of new api.

I'll update this as i find new issues.

Current blockers/issues

  • Use bare paths everywhere which does not work with the new rules based toolchains. I have migrated some of it, but there are parts that can't be converted easily, eg paths that are available on the host system. See: https://github.com/bazelbuild/rules_cc/issues/277

  • Sysroot logic assumes that the label provided by the user will already contain all the files extracted and laid out on disk, we need a contract there based on new directory and subdirectory rules.

  • On MacOS, sysroot is provided by the host which can't be tracked by bazel unless we copy all of into via repository_rule. https://github.com/bazelbuild/rules_cc/issues/242#issuecomment-2828575745

thesayyn avatar Apr 24 '25 18:04 thesayyn