bazel
bazel copied to clipboard
incompatible_enable_cc_toolchain_resolution: Turn on toolchain resolution for cc rules
Flag: --incompatible_enable_cc_toolchain_resolution Available since: 0.23 Will be flipped in: 6.0.0 Tracking issue: #6516
Current status
C++ rules are to our best knowledge ready to be migrated to platforms. Fat APK binary of Android rules and Multiarch binary for Apple rules are still using the legacy C++ toolchain selection. We expect those to be fixed in Bazel 0.26.
Motivation Currently, cc rules access the cc_toolchain by a combination of the --crosstool_top, --cpu, and --compiler flags. This system is complex, lacks flexibility, and will be removed in favor of using the toolchain resolution system added to Bazel.
When this flag is enabled, the legacy system for accessing the cc toolchain will be disabled, and the cc rules will instead use the toolchain resolution system. Eventually, the legacy code will be removed.
Migration
If you're not using C++, or custom --crosstool_top
, --cpu
, or --compiler
,
you don't use select
on these options, you can stop reading now, there is
nothing to migrate for you.
Build configured by the current C++ toolchain
You will eventually have to migrate select
s and conditional logic in Starlark
rules to toolchain/platforms. There is no generally applicable migration tool.
But to make incremental migration possible, support for PLATFORM_MAPPING
file
was added to Bazel. See https://github.com/bazelbuild/bazel/issues/6426 for up
to date infomation and documentation.
C++ Toolchain owners
For users who are using --crosstool_top
to define custom cc_toolchain
targets, it will be necessary to add the corresponding toolchain targets (see
the very rudimentary example in Bazel's autoconfigured toolchain), and to
register those toolchains (Bazel's autoconfigured toolchain example).
For users who are cross-compiling by using --cpu, it will be necessary to add appropriate platform targets and to use the --platforms flag to ensure targets are built correctly. Until the --cpuflag can be fully deprecated or a migration path can be in place, it will be necessary to set both --cpu and --platforms.
Please use constraint_settings
and constraint_values
from
the canonical Bazel Platforms
Repository (don't hesitate to propose
missing targets!). If there is a need for C++ specific constraints, feel free to
upload a PR to the rules_cc
repository. It is extremely important that the whole ecosystem uses the same
constraints, we can only reuse libraries and toolchains when we speak the same
language.
Starlark rules owners
toolchain_type
For Starlark rules owners who depend on C++ toolchain it will be necessary to declare dependency on C++ toolchain type.
Before:
foo = rule(
implementation = _foo_impl,
attrs = {
"_cc_toolchain": attr.label(default = Label("@bazel_tools//tools/cpp:current_cc_toolchain")),
},
)
After:
foo = rule(
implementation = _foo_impl,
attrs = {
"_cc_toolchain": attr.label(default = Label("@bazel_tools//tools/cpp:current_cc_toolchain")),
},
toolchains = ["@bazel_tools//tools/cpp:toolchain_type"],
)
In case you're already using Bazel 0.27, save yourself a migration and depend
on rules_cc
:
toolchains = ["@rules_cc//cc:toolchain_type"],
find_cpp_toolchain
See the docs and use @rules_cc//cc:find_cc_toolchain.bzl
(if using Bazel >= 0.27) or @bazel_tools//tools/cpp:toolchain_utils.bzl
to locate current C++ toolchain (otherwise). Also see examples for general usage.
Note that this flag will supercede and replace the previous --enabled_toolchain_types
flag, which was introduced before the incompatible change policy was formulated, and which was only ever used for the cc rules, anyway.
People who are interested: @hlopko, @lberki, @nlopezgi.
Will native transitions like https://github.com/bazelbuild/bazel/blob/4b00ab1b97e5711c2fd65eb8150923effac0dae7/src/main/java/com/google/devtools/build/lib/rules/android/AndroidRuleClasses.java#L252 also need attention?
Yes, @aragos and I have been discussing a plan. We will probably not be able to flip this flag until we have a solution, so I am removing the "Will be flipped in" tag.
Please do not assign issues to more than one team
When testing with this flag on in rules_go at dd527c7d
with Bazel 0.26.0rc8 on macOS, I'm seeing this error:
$ bazel build --incompatible_enable_cc_toolchain_resolution tests/legacy/examples/cgo:sub
> > Loading:
Loading: 0 packages loaded
Analyzing: target //tests/legacy/examples/cgo:sub (0 packages loaded, 0 targets configured)
INFO: Analyzed target //tests/legacy/examples/cgo:sub (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
bazel: Entering directory `/private/var/tmp/_bazel_jayconrod/95575c789b578fe26b1744a32454af42/execroot/io_bazel_rules_go/'
[0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt
ERROR: /Users/jayconrod/go/src/github.com/bazelbuild/rules_go/tests/legacy/examples/cgo/BUILD.bazel:29:1: GoCompilePkg tests/legacy/examples/cgo/darwin_amd64_stripped/sub%/github.com/bazelbuild/rules_go/examples/cgo/sub.a failed (Exit 1) builder failed: error executing command bazel-out/host/bin/external/go_sdk/builder compilepkg -sdk external/go_sdk -installsuffix darwin_amd64 -src tests/legacy/examples/cgo/sub/floor.go -importpath ... (remaining 19 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
ld: framework not found UIKit
clang: error: linker command failed with exit code 1 (use -v to see invocation)
compilepkg: error running subcommand: exit status 1
bazel: Leaving directory `/private/var/tmp/_bazel_jayconrod/95575c789b578fe26b1744a32454af42/execroot/io_bazel_rules_go/'
Target //tests/legacy/examples/cgo:sub failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.488s, Critical Path: 0.29s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
It looks like -framework UIKit
is being added to the C++ link flags. This target doesn't set link flags on its own, so I think this is coming from cc_common
. Is that expected?
I'm investigating the UIKit issue.
Yay for surprises :) So the osx C++ toolchain doesn't yet work with platforms because of this TODO. @katre kindly volunteered to fix it :)
--action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
cannot be used with --incompatible_enable_cc_toolchain_resolution
.
I was running some additional tests with --incompatible_enable_cc_toolchain_resolution
with RBE (although this does not have to do with RBE, I think), and I was unsure if --incompatible_enable_cc_toolchain_resolution
was working as I expected it to work (i.e. specifically that it was using the remote cc_toolchain that I was indicating instead of the local one) I tried to add BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
but my build failed with the following error:
INFO: ToolchainResolution: Selected execution platform @rbe_default//config:platform, type @bazel_tools//tools/jdk:toolchain_type -> toolchain @bazel_tools//tools/jdk:dummy_toolchain
ERROR: /root/.cache/bazel/_bazel_root/7e958634aed2e0b9513fa7cce861a282/external/local_config_cc/BUILD:28:1: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'k8'
Note that without the BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN
flag my builds are running fine, and even when I check with --toolchain_resolution_debug
it does seem to be the case that my remote builds are picking the toolchain that I want them to pick and not the local one. But perhaps this means that its time to remove support for BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN
and perhaps provide some other way to not register the local_cc_config
(i.e., I really want to be able to run builds with the local_cc_config
disabled altogether so that I can be super sure all actions can only run remotely with the toolchain config I explicitly set).
@laurentlb I think the migration-0.25 tag should be removed from this issue because of this issue that was cherry picked into 0.26 https://github.com/bazelbuild/bazel/issues/8330
@nlopezgi In the error message I see cc_toolchain_suite
, that shouldn't be used at all with --incompatible_enable_cc_toolchain_resolution
, is it possible that you were building the cc_toolchain_target
explicitly? The expected error message is:
ERROR: While resolving toolchains for target //:too: no matching toolchains found for types @bazel_tools//tools/cpp:toolchain_type
Can you describe a repro case if there's still a bug?
I just uploaded https://github.com/bazelbuild/bazel/pull/8459, which AFAIK removes the need for BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN
variable (but I kept it around for cases like you mention - where you want to be super sure local_config_cc
doesn't register any toolchain. But the original use case - preventing Bazel from autoconfiguring local C++ toolchain when there are remote toolchains fully configured, is fixed by https://github.com/bazelbuild/bazel/pull/8459 (toolchain targets will still be generated, but system will only be inspected after the C++ toolchain is already selected).
Hi Marcel,
You can repro this error using as base this GCB build script which tests a very simple cc target (//examples/remotebuildexecution/hello_world/cc:say_hello_test
, i.e., I am not building the cc_toolchain_target
)
The GCB yaml file shows all the flags needed to repro. You will need to change the --remote_instance_name
to point to some instance of RBE you have access to (or maybe do a local build inside the bazel container, it probably also can be repro'd there?).
The error reproduces when building with --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
(build log)
but does not happen when the --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
is ommitted (build log.
See also my comment in #8459 about how this will impact remote execution.
Hi @nlopezgi, the problem with BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN
was fixed by https://github.com/bazelbuild/bazel/pull/8459. Thanks for reporting!
To be explicit - I don't think we need to postpone flipping this flag because of BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN
bug - users can still migrate incrementally in 0.26 (with --incompatible_enable_cc_toolchain_resolution
disabled), and they will get the correct behavior in 0.27 (with --incompatible_enable_cc_toolchain_resolution
enabled).
We found a case with master / 0.27.0 where this doesn't work for iOS builds https://github.com/bazelbuild/bazel/issues/8716
Removing the bazel 1.0 label and extending the migration window.
@katre Hello :) Is this still happening? Seems like 2.x is no longer a migration window for this flag, is this expected?
I actually don't care (or know) so much about this flag itself, but I just noticed that we have a special case for this flag in the bazel_bootstrap_distfile_test, which causes the bootstrap to run twice: https://github.com/bazelbuild/bazel/blob/master/src/test/shell/bazel/bazel_bootstrap_distfile_test.sh#L76
This is too expensive for CI (it takes over 10 minutes on Windows). Is there a cheaper way to test this rather than running a full bootstrap (which builds Bazel twice, serially)?
I'll investigate.
So're in a tricky situation:
- The Configurability team absolutely wants to flip this flag.
- The Rules-C++ team member who was championing this is no longer on the team.
- We didn't flip this for Bazel 1.0 because of the follow-on problems with Android and iOS rules.
- No one is championing those either.
I'm working on a plan to get these addressed and flipped, but it won't be near-term. So to alleviate CI pain, is it possible to disable the test normally, and run it once per day or so, just to catch any regressions?
(CC @aiuto and @lberki in case I am mis-stating the case for flipping this flag)
As a note, we don't appear to be testing this with the migration test pipeline, is that intentional? See https://buildkite.com/bazel/bazelisk-plus-incompatible-flags/builds/393
We're close to making this possible technically (i.e. all the supporting C++ work is done). The big challenge is 3) above: how to not inadvertently break depending projects that aren't in C++ and don't themselves understand platforms.
We have an interim solution. But as John says we need someone to guide the process so projects that might be affected don't get caught offguard and don't know what to do. Ideally this just involves defining a couple of strategic platform mappings in a common place.
This test is basically the only reason now why our presubmits are 17 instead of 12 minutes and I don't see any way how to make it faster.
I read everything I could find about this flag and I still don't understand why we have to run a full Bazel bootstrap using compile.sh
as part of our tests to verify that this flag works.
Couldn't we alternatively build e.g. a helloworld.cc
with this flag enabled to ensure that it works?
@philwo : how does that test make our presubmits 5 minutes slower? My mental model is that we need to do at least one bootstrap test and if we do one, doing two in parallel takes the same amount of wall time.
@katre : in order to test this with incompatible flags, the migration-*
tag needs to be on the bug. I think we just simply forgot to add them.
@katre: remind me again, what's the problem with (3)? I thought platform mapping makes this a no-op for Android and iOS and the only thing that needs to be done after the flag flip is to change their select()
statements.
That said, we should totally have well-known platform definitions to replace the well-known --cpu
values.
@lberki We are hardware-limited on CI, especially on macOS. Anything we can do to cause the tests to use less hardware resources has massive impact. For example, with my "fix our test framework to not extract 500 JDKs per run" change, I moved us from being so insanely I/O bound that the only way to run our tests was in a RAM disk to being no longer I/O bound at all. Thus we were able to get rid of RAM disks (= switch from highmem to cheaper standard machines) and get rid of the local SSDs (save money and complexity, allows use of machines with faster CPUs that do not have SSDs available).
Another little two line fix then got our postsubmit on Linux down from 50 to 35 minutes on Linux. This high impact is all only possible, because our code and tests eat resources like there's no tomorrow.
This test is the top 1 on my list of "slowest tests on the CI".
A Bazel bootstrap easily occupies the entire machine's capacity (all cores, lots of memory, much I/O). While it does so, the processing power spent on this is not available for other tests. Thus, by running two bootstraps, you occupy the machine's entire capacity twice for the duration of the test. There's nothing gained by running them in parallel - a Bazel bootstrap already is "parallel", because it's Bazel (which is quite good at parallel execution ^^) building itself.
I think we should just remove this test_bootstrap_with_cc_rules_using_platforms
test. I talked with Marcel about it, and this test is not to test that the flag works, but it's to test that our code base can still be built with that flag on. That's nice, but that's exactly why we have the Bazelisk migration pipeline on CI, which runs in the night, when using a lot of capacity is fine.
WDYT? If you agree, I'll go ahead and remove the test, so that we just rely on the Bazelisk pipeline. If not, I'd disable it for presubmit, but note that postsubmits aren't free either - we only have so many Macs and while they're running a postsubmit test, they can't run a presubmit.
I was thinking we could run the two bootstraps in two different machines and take less wall time that way.
As long as bootstrapping Bazel with the flag set is tested by the incompatible flag pipeline, I'm fine with removing the test. Does it run the bootstrap with the incompatible flags set, though? I'm not sure what happens with the "inner" Bazel instances (the ones that are being tested, not the one that runs the tests)
We cannot just run the two bootstraps on different machines, because we simply do not have enough Mac machines. πΈπ
If this flag is marked as migration ready, then it will be included in the "Bazelisk + Incompatible flags" pipeline. I just verified what that actually does. In that case, it would run a bazel build --incompatible_enable_cc_toolchain_resolution //src:bazel
where bazel
is the latest release and the source code it tries to build is Bazel HEAD. Which means, it verifies that the latest Bazel release with that flag flipped can still build Bazel from scratch.
Unless there's some specific reason why we have to test this flag using ./compile.sh
, it seems like this is exactly what we want to test. WDYT?
SGTM. As long as the code path is being tested regularly to prevent rot, I am fine with this.
Sounds like a plan. AFAIU the bootstrapping process relies on Bazel itself to build C++ code and the only part that's special-cased is Java compilation, so it should be all fine.
The most recent status here is that the flag is ready to flip, except that Android and Apple rules don't handle platforms well, so there's no clear way to manage those transitions. Configurability is working with the Android rule owners to fix those rules, and then we'll look into Apple rules when that's in place.