bazel
bazel copied to clipboard
action_env variables (sometimes?) ignored in cc_library rules
Description of the problem / feature request:
I'm setting some env variables via action_env but they are sometimes ignored when building a library.
Feature requests: what underlying problem are you trying to solve with this feature?
In this specific case I'm trying to build TensorFlow 2.3.0 with Bazel 3.4.1 and am setting CPATH via action_env but the compilation fails to find some headers and by inspecting the output I see CPATH not being passed to all compiler invocations.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Build TensorFlow according to instructions and set --action_env=CPATH=/foo and observe the output
Any other information, logs, or outputs that you want to share?
I observed the issue for tensorflow/python/tools/BUILD:226:10 C++ compilation of rule '//tensorflow/core/platform/default:mutex'. The definition for that is at https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/core/platform/default/BUILD#L206:
cc_library(
name = "mutex",
srcs = [
"mutex.cc",
"mutex_data.h",
],
hdrs = ["//tensorflow/core/platform:mutex.h"],
tags = [
"manual",
"no_oss",
"nobuilder",
],
textual_hdrs = ["mutex.h"],
deps = [
"//tensorflow/core/platform",
"//tensorflow/core/platform:macros",
"//tensorflow/core/platform:thread_annotations",
"//tensorflow/core/platform:types",
"@nsync//:nsync_cpp",
],
)
Log output:
(cd /tmp/easybuild-tmp/eb-tHWBp_/tmphugHLy-bazel-build/execroot/org_tensorflow && \
exec env - \
LD_LIBRARY_PATH=<redacted> \
PATH=<redacted> \
PWD=/proc/self/cwd \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.d '-frandom-seed=bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote . -iquote bazel-out/ppc-opt-exec-50AE0418/bin -iquote external/com_google_absl -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/ppc-opt-exec-50AE0418/bin/external/nsync -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 -g0 '-std=c++14' -c tensorflow/core/platform/default/mutex.cc -o bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/platform/default/_objs/mutex/mutex.o)
Execution platform: @local_execution_config_platform//:platform
As you can see only the default env vars are set but none of my action_envs (I set more than CPATH)
As the cc_library rule invocation is as simple as can be and I see that with other such invocations too (e.g. https://github.com/tensorflow/tensorflow/issues/37861#issuecomment-686418236) I assume a bug in Bazel or a misunderstanding of the rules by the TF team or me. In that case I'd kindly ask for clarification.
I've also checked that with TF 2.1.0 and Bazel 0.29.1 this was not an issue but worked as it should for the same file (at least the rule invocation was not changed)
However for other compilations it seems to work. E.g. for SUBCOMMAND: # @com_google_absl//absl/strings:strings [action 'Compiling external/com_google_absl/absl/strings/internal/memutil.cc', configuration: cab848d308c51b34c791977a9ba0d73e541cabe37f3e6da7b163b34d8bf29b6b, execution platform: @local_execution_config_platform//:platform] I see all my action_env variables being passed and that is also added with cc_library: https://github.com/abseil/abseil-cpp/blob/df3ea785d8c30a9503321a3d35ee7d35808f190d/absl/strings/BUILD.bazel#L30
What could be the reason for that?
Potentially related to https://github.com/bazelbuild/bazel/issues/12049
In examining the output I've seen that SUBCOMMAND: # <name> [action 'Compiling <cc>', configuration: <hash>] shows differing values for "configuration". For e.g. "5145bac2d59ecd1d49c139f258a492145fb9a97392eff5f5e72d0b94901ca2a6" all action_env values are there but for "10b8e9fd33cf42203bb848e842763cc669d976ce3de802a9b8c5cf443259dc2c" only the minimal env is used.
Does this help? What is this "configuration" and how is it influenced?
Likely to be #4008.
I'm not sure it is the same because in my case it is not a genrule that fails but a cc_library. But let's see: Building TF with 1 process only it fails on '//tensorflow/cc:cc_op_gen_main' which is
cc_library(
name = "cc_op_gen_main",
srcs = [
"framework/cc_op_gen.cc",
"framework/cc_op_gen.h",
"framework/cc_op_gen_main.cc",
],
...
)
That is only used by tf_gen_op_wrapper_cc:
def tf_gen_op_wrapper_cc(
name,
out_ops_file,
pkg = "",
op_gen = clean_dep("//tensorflow/cc:cc_op_gen_main"),
deps = None,
include_internal_ops = 0,
# ApiDefs will be loaded in the order specified in this list.
api_def_srcs = []):
# Construct an op generator binary for these ops.
tool = out_ops_file + "_gen_cc"
if deps == None:
deps = [pkg + ":" + name + "_op_lib"]
tf_cc_binary(
name = tool,
copts = tf_copts(),
linkopts = if_not_windows(["-lm", "-Wl,-ldl"]) + lrt_if_needed(),
linkstatic = 1, # Faster to link this one-time-use binary dynamically
deps = [op_gen] + deps,
)
...
So there it is in deps not tool. Not sure how to go further up the call chain. Can I make Bazel print what it is currently following?
I think you can use -s?
That is --subcommands, yes. Already using it and lead me to seeing where the action_env is missing. Now I would like to know the rule invocation chain to see where it might be coming from and why the action_env is missing there.
I noticed that for SUBCOMMAND: # //tensorflow/core/framework:summary_proto_genproto [action 'ProtoCompile tensorflow/core/framework/summary.pb.cc', configuration: 8b4acdbe969a46c4d3fa5723b9b1930adc1b1ea5a72f6ca24e87d24e41e941cc, execution platform: @local_execution_config_platform//:platform] the env is also missing. According to #4008 you could add use_default_shell_env = True, but that is already there. So maybe not that issue?
Edit: My best trace so far is the change in "configuration" I mentioned in https://github.com/bazelbuild/bazel/issues/12059#issuecomment-689647000
Dependencies like llvm-project are compiled with action_env and the working configuration while the failing ones use another hash. What is that configuration? When/How does it change? How can I find that in the TF source?
Ok, it looks like you were on the right track with #4008. I bisected the TF sources between 2.2 and 2.3 to find the commit which introduced the breaking changed and found it to be this one: https://github.com/tensorflow/tensorflow/commit/f827c023906e7d30f0e5f2992b111ab34153310a The parent commit 3006330ea0c3e88651195ac7c7de654291377ebb works. However in that case it isn't "tools" but "exec_tools" where the former worked.
It would be awesome if there could be some quick fix so action_env variables are passed through to dependencies build through exec_tools dependency
Is still something we need to work on?
Yes. As you noticed in https://github.com/tensorflow/tensorflow/pull/43156#issuecomment-704367191 it isn't clear why and when tools or exec_tools are to be used and if exec_tools would be correct then there needs to be some way to pass environment variables through to it. Otherwise it will be impossible to build e.g. TF on some systems
It's possible we no longer need exec_tools, it was an intermediate step in removing Py2 stuff. We can start making PRs to remove that and see what breaks/works.
This seems to be still an issue with Bazel 4.2.2, noticed with TF 2.8.4 when using CUDA where sometimes the --action_envs get ignored and compilation fails to find headers. E.g.:
Execution platform: @local_execution_config_platform//:platform
In file included from ./tensorflow/core/platform/status.h:32,
from ./tensorflow/c/tf_status_helper.h:20,
from tensorflow/c/env.cc:20:
bazel-out/ppc-opt-exec-50AE0418/bin/tensorflow/core/protobuf/error_codes.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory
10 | #include <google/protobuf/port_def.inc>
This is really annoying especially as it is pretty much impossible to debug and/or isolate
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please post @bazelbuild/triage in a comment here and we'll take a look. Thanks!