Not able to cross-compile py_binary in Bazel 7 without a cc toolchain
π bug report
Affected Rule
The issue is caused by the rule: py_binaryIs this a regression?
Yes, the previous version in which this bug was not present was: Bazel 6.3.2Description
Cross-compiling py_binary from macOS arm64 to either macOS amd64 or Linux amd64 break when using Bazel 7. It was working in Bazel 6.
Not sure if this bug report belongs here or Bazel. Let me know if I should move it to Bazel.
π¬ Minimal Reproduction
Run bazel build --incompatible_enable_cc_toolchain_resolution --platforms //:darwin_amd64 --enable_bzlmod //:hello in the following workspace with Bazel 7 and Bazel 6:
-- MODULE.bazel --
module(name = "cross_compliing")
bazel_dep(name = "rules_python", version = "0.31.0")
python = use_extension("@rules_python//python/extensions:python.bzl", "python")
python.toolchain(
configure_coverage_tool = True,
ignore_root_user_error = True,
is_default = True,
python_version = "3.9",
)
-- WORKSPACE --
-- BUILD.bazel --
load("@rules_python//python:defs.bzl", "py_binary")
py_binary(
name = "hello",
srcs = ["hello.py"],
)
platform(
name = "darwin_amd64",
constraint_values = [
"@platforms//cpu:x86_64",
"@platforms//os:macos",
],
visibility = ["//visibility:public"],
)
platform(
name = "linux_amd64",
constraint_values = [
"@platforms//cpu:x86_64",
"@platforms//os:linux",
],
visibility = ["//visibility:public"],
)
-- hello.py --
if __name__ == "__main__":
print("hello world")
π₯ Exception or Error
ERROR: /private/var/tmp/_bazel_zplin/91d514326e938596854c0d93324b38ae/external/bazel_tools/tools/cpp/BUILD:58:19: in cc_toolchain_alias rule @@bazel_tools//tools/cpp:current_cc_toolchain:
Traceback (most recent call last):
File "/virtual_builtins_bzl/common/cc/cc_toolchain_alias.bzl", line 26, column 48, in _impl
File "/virtual_builtins_bzl/common/cc/cc_helper.bzl", line 219, column 17, in _find_cpp_toolchain
Error in fail: Unable to find a CC toolchain using toolchain resolution. Target: @@bazel_tools//tools/cpp:current_cc_toolchain, Platform: @@//:darwin_amd64, Exec platform: @@local_config_platform//:host
ERROR: /private/var/tmp/_bazel_zplin/91d514326e938596854c0d93324b38ae/external/bazel_tools/tools/cpp/BUILD:58:19: Analysis of target '@@bazel_tools//tools/cpp:current_cc_toolchain' failed
ERROR: Analysis of target '//:hello' failed; build aborted: Analysis failed
π Your Environment
Operating System:
macOS Sonoma 14.4.1 on Apple M1 Max
Output of bazel version:
Build label: 7.1.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Thu Mar 21 18:08:59 2024 (1711044539)
Build timestamp: 1711044539
Build timestamp as int: 1711044539
Rules_python version:
bazel_dep(name = "rules_python", version = "0.31.0")
Anything else relevant?
The issue can be worked around with --noincompatible_enable_cc_toolchain_resolution in Bazel 7, although it worked in Bazel 6 with and without cc_toolchain resolution.
This is the root cause for #1825
I previously thought that it is due to bazel version upgrade and not because of rules_python changes. Could somebody verify if the same bug exists in 0.30? I think that is the version before we switch to starlark rules, if that is broken in the same way, then I don't think there is anything we can do here.
I agree this is due to Bazel version upgrade. But isn't rules_python responsible for making sure that itself work with latest Bazel?
I tested with rules_python 0.28 through 0.31 and the result is the same with the latest bazel.
It seems that the problem is because we reference @bazel_tools//tools/cpp:current_cc_toolchain here. And the reason for this is that we are using a helper function to get the platform using cc_helper.find_cpp_toolchain(ctx) here, which is a bazel-internal thing as documented here.
Not sure if there is a better way to do this, but the flipping of the toolchain resolution in bazel 7 indeed broke python binary rule cross-compilation. @rickeylev, do you know if there is a workaround here?
At $dayjob we cross-build items and we have a cc toolchain, so I guess a known workaround for this would be to register a cc_toolchain (e.g. hermetic_cc_toolchain or something similar).
but the flipping of the toolchain resolution in bazel 7
I don't think this is related to the flipping. bazel build --incompatible_enable_cc_toolchain_resolution --platforms //:darwin_amd64 --enable_bzlmod //:hello works in Bazel 6
You are correct, the offending commit is https://github.com/bazelbuild/bazel/commit/18955851947002ea39854b5d2aa3e6fa81ef8bf3
I ran: bazelisk --bisect=6.0.0..HEAD build //:hello
@rickeylev looking at the list in https://github.com/bazelbuild/bazel/issues/15897 it does not seem that this incompatibility was expected.
Do you think I should move this bug to bazel?
I am not sure where the code that needs to be fixed lives, it could be in both repos since the starlark implementation of rules python lives in rules_python as well.
Could you please move it to bazel, I think it may get the eyes of the right people sooner.
On 19 April 2024 23:24:23 GMT+09:00, Zhongpeng Lin @.***> wrote:
Do you think I should move this bug to bazel?
-- Reply to this email directly or view it on GitHub: https://github.com/bazelbuild/rules_python/issues/1857#issuecomment-2066696632 You are receiving this because you commented.
Message ID: @.***>
This all sounds a bit weird. Build output with --toolchain_resolution_debug would be helpful.
The Bazel 6 java-implementation of the rules also depended on the CC toolchain, so I would expect that to have the same resolution logic.
rules_python releases prior to 0.31.0 didn't enable the rules_python-based Starlark implementation, i.e. they use the Bazel-based Starlark implementation.
So yeah, this does sound like something relating to the Starlark implementation, be it the one in Bazel itself or rules_python.
cc_helper.find_cpp_toolchain used to get toolchain_id in write_build_data
IIRC, the write_build_data codepath isn't active in rules_python/Bazel. But, that fact is a bit moot -- the way toolchain resolution works is it resolves all the rule's toolchains, even if one isn't used.
I think we could do away with using cc_helper.find_cpp_toolchain. IIRC, it's logic is pretty simple, so should be OK to copy/paste or re-implement. I'm not sure if that'll help, though; if the issue is that there isn't a matching toolchain, then there isn't much we can do.
We could make the CC toolchain optional. That should be possible.
Our test matrix includes bazel 6, 7, and rolling releases, but doesn't attempt any cross-building. I'm surprised it ever worked.
I don't have a working mac machine right now, and probably won't for another week.
I think I ran into this with the RBE tests and the analysis test for precompiling. I think this can be repro'd on linux by building for another platform and trying to run the analysis tests.
bazel test //tests/base_rules/py_binary/... --toolchain_resolution_debug=cpp:toolchain_type --platforms=//tests/support:windows_x86_64
I'm not sure where the toolchain lookup is happening, though. I started to comment out all the places where the cc toolchain is mentioned, but something is still trying to resolve the cc toolchain. Additionally, I noticed the cc toolchain is marked as optional.
Aha, i think i found it: the implicit _launcher attribute:
- attr
_launcher-> - alias @bazel_tools//tools/launcher:launcher ->
- windows: alias @bazel_tools//tools/launcher:launcher_windows ->
- remote: cc_binary @bazel_tools//src/tools/launcher:launcher
- default: file @bazel_tools//tools/launcher:launcher.exe
- default: cc_binary @bazel_tools//src/tools/launcher:launcher
- windows: alias @bazel_tools//tools/launcher:launcher_windows ->
I think the fix here is to have the launcher attribute point to a select. It's only used for windows. Other platforms should just point to a no-op dependency. This should be a pretty easy fix.
My guess is, if we dig up the old Java code, we'll find this attribute was a one of those old dynamically-computed-at-the-java-level attributes and it got a null value for non-windows platforms.
posterity: --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 can also cause this. The RBE tests set this. It prevents the cc toolchain from being auto detected and registered.
I'm pretty sure this was fixed by https://github.com/bazelbuild/rules_python/pull/1902; I fixed it as part of that PR since CI was failing there, too (it was also failing at main under the right circumstances). I moved the launcher to an alias with a select so that only windows has the launcher dependency.