Enable CC toolchain to use in-tree builtin includes
If the toolchain is composed in our build tree and we use an internal (relative) path for the builtin_sysroot that we pass to create_cc_toolchain_config_info, and we pass a value like %sysroot%/usr/include in cxx_builtin_include_directories then we will fail with missing dependency declarations.
Instead of only using the absolute paths for builtin includes, we can use all of them and do an additional matching check after we relativized the paths to the individual include items.
@lberki or @oquenchil any opinions on this?
Do I understand correctly: The key effect of this change is that all CcToolchain built_in_include_directories entries would be treated the same, regardless of whether they are absolute or relative paths?
Yes, I believe so. In my case, specifically, I'm using a toolchain that we're constructing as part of our build, so the built-in includes show up in bazel-out
Sorry for the naïve question, but do you have a concrete example or sample toolchain? It seems quite odd to me to have output directories like bazel-out be present in include directories -- having outputs like this appear as inputs but with paths constructed outside of bazel seems potentially problematic, but perhaps I'm missing something simple. (Is this a two stage build? How are you deriving these paths?)
I am trying to dig into the code in question to better understand why this requirement exists in the first place...
I'm also wondering if something other than %sysroot% as a prefix would work better (e.g. %workspace%)
More info on the include dir prefixes here: https://cs.opensource.google/bazel/bazel/+/master:src/main/starlark/builtins_bzl/common/cc/cc_toolchain_provider_helper.bzl;l=103-122;drc=f2318d2cbf38d249d620dc99a11570843306a6fa
Sorry for the naïve question, but do you have a concrete example or sample toolchain? It seems quite odd to me to have output directories like bazel-out be present in include directories -- having outputs like this appear as inputs but with paths constructed outside of bazel seems potentially problematic, but perhaps I'm missing something simple. (Is this a two stage build? How are you deriving these paths?)
tl;dr we're driving an internal hermetic toolchain that's derived from the contents of RPMs. It'll be hard to distill down into some thing trivial, but let me attempt to at least describe the details.
Longer version:
-
We have a module extension that consumes the RPMs and synthesizes an internal repository for them, generates filegroup(), cc_library(), cc_import(), etc rules using the RPM inputs
-
We have a separate module extension that can be used to generate the toolchain:
- we define a sysroot using the RPM-import repository generated rules as inputs in the form of:
ext.sysroot(
name="foo_sysroot",
srcs = [
... a bunch of filegroups that specified the files from the RPMs ...
],
)
which ultimately synthesizes a repository to capture the sysroot
- we define a toolchain where two of the things specified are (1) the sysroot, (2) the list of builtin include paths relative to the sysroot in the following form:
ext.toolchain(
name = "bar_toolchain",
sysroot = "@foo_sysroot//:sysroot",
builtin_includes = ["%sysroot%/usr/include"],
)
The builtin_includes value is just getting plumbed directly into cxx_builtin_include_directories for create_cc_toolchain_config (via some indirection) and the path to @foo_sysroot//:sysroot gets shovelled in builtin_sysroot (which is ultimately what populates %sysroot%)
The /usr/include part of the builtin_includes is derived from the fact that we know from our RPM inputs that we stuck the headers that we're interested in into usr/include because we literally just explode the RPM file(s) into a tree somewhere. The relative path part of things comes from @foo_sysroot//:sysroot.
I am trying to dig into the code in question to better understand why this requirement exists in the first place... I'm not clear on this either. The commit log didn't render me any particular insights and I'm still gaining familiarity with the bazel codebase.
I'm also wondering if something other than
%sysroot%as a prefix would work better (e.g.%workspace%)
The issue appears to be that it doesn't like relative paths which %workspace% is also going to give us.
cc @armandomontanez @matts1
Backing up a bit to the most likely culprit just to double check: have you passed -no-canonical-prefixes (gcc/clang) and -fno-canonical-system-headers (gcc only) as compile flags in your toolchain, also making sure the headers are properly added to all_files and compiler_files? Once you've done that, you usually only need to allowlist truly absolute paths (e.g. /usr/include).
Separately, I need to stare at this change longer to determine whether or not it's correct but at first glance I think it is.
Before landing this, I'd like to stamp out a minimal reproducer that we can check against (should be pretty easy with the new rule based toolchains). It just happens that in the vast majority of cases the root cause is actually forgetting to use those flags. Some compilers do not support or respect those -no-canonical-prefixes flags, so you do need an escape hatch like this to work with in-tree paths.
@kellyma2 Have you checked the flags?
Backing up a bit to the most likely culprit just to double check: have you passed
-no-canonical-prefixes(gcc/clang) and-fno-canonical-system-headers(gcc only) as compile flags in your toolchain, also making sure the headers are properly added toall_filesandcompiler_files? Once you've done that, you usually only need to allowlist truly absolute paths (e.g./usr/include).
The files are in all_files and compiler_files. A quick test with --copt fails - -no-canonical-prefixes causes gcc to not be able to find cc1plus and -fno-canonical-system-headers appears to do nothing helpful in this case. ie: it still complains about missing dependency declarations for all of the toolchain inputs. eg:
'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/stdc-predef.h'
'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/bits/wordsize.h'
'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/features.h'
This is pretty much unavoidable, I think, because I'm extracting the toolchain components as part of the build.
@kellyma2 Have you checked the flags?
Yes. Thanks for the followup.
Related: https://github.com/bazelbuild/bazel/issues/4605
I've spent a bit of time unpacking this, and this looks like an interesting case where:
-no-canonical-prefixesis NOT in use.-isystempath/to/toolchain/foo/includeis manually passed to the toolchain.path/to/toolchain/foo/includeis listed incxx_builtin_include_directories
So to unwind the stack:
- When
-no-canonical-prefixesis passed, all of this is a non-issue. -isystempath/to/toolchain/foo/includecauses the headers to be found via a relative path, which is picked up by parsing the.dmake dependency file.- This patch makes
cxx_builtin_include_directoriesmatch against those paths, except in a way that appears to specifically silence errors specifically emitted by unresolvablePathProblems.
The last bullet here is where I'm not confident this is what we want to do. unresolvablePathProblems is warning that the files that your toolchain is referencing aren't enumerated as files via the same paths they're included by, which suggests a sandbox leak. If all_files and compiler_files adds the files to the sandbox via the same relative path, in theory this error shouldn't ever pop up.
After some back and forth with @armandomontanez we determined a few things:
(1) we had pieces of our toolchain (still) leaking in from the outside world due to ... (2) ... not using the canonical-prefixes related arguments resulting in ... (3) includes coming from somewhere outside of the sysroot
Once we solved that and the things that broke as a result it's no longer necessary to use cxx_builtin_include_directories to talk about those includes coming in which renders this change redundant.