bazel
bazel copied to clipboard
Always pass -D__BAZEL__ to C compiler
As we know, C++ headers have quadratic stat() multiplied by all the -isystem flags Bazel passes to gcc. But if all include paths are vendored to the Bazel execroot, it's only quadratic, and compiling goes way faster. But rewriting include lines breaks people who want to compile the sources for a particular project outside of Bazel. It also creates headaches for people packaging libraries for Linux distros.
Maybe the solution is to write code like this:
#ifdef __BAZEL__
# include "external/zlib/zlib.h"
#else
# include <zlib.h>
#endif
If that is the optimal approach, then maybe it would make sense to make that a standard define, so -D__BAZEL__
doesn't need to be manually passed copts
.
Why then not take the more general approach and have a VEND_HDR_PREFIX
defined, which can be used like so:
#ifdef VEND_HDR_PREFIX
# include " VEND_HDR_PREFIX /zlib.h"
#else
# include <zlib.h>
#endif
This way you have at least a slim chance to upstream that.
IIRC, defines with two underscores and then uppercase are reserved for the standard (implementor).
Well there would be a different prefix for every third party library, because they're all in their own bazel repositories.
Does the performance matter in practice? I was under the impression that the look-for-every-include-under-every-include-path-entry algorithm isn't a concern for Bazel, because there aren't that many include path entries.
Should this not perhaps be rather solved at Compiler invocation level? I would rather have a map which you feed the compiler so it no longer searches for <zlib.h>
but is taught where it is. Especially since a lot of that information is already available in Bazel.
Does any compiler at the moment provide that functionality?
Not that I know of. Would be awesome, though 🤓
Does the performance matter in practice?
I ran strace on GCC invocations for TensorFlow and determined those stat() system calls were >50% of wall time. So yes, but with a grain of salt.
Part of the problem is that TensorFlow's build configuration is not optimal. It basically globs everything into one big cc_library. This means every single one of our dependencies (nearly all of which specify the includes
attribute) are part of the compilation for nearly every .cc file. If TensorFlow broke up that cc_library into more finely grained targets, it would make TensorFlow builds faster. But it would not make builds faster for downstream targets referencing TensorFlow as a single label.
Should this not perhaps be rather solved at Compiler invocation level? I would rather have a map which you feed the compiler [...]
That would be the optimal solution. I spent an hour digging through the GCC docs looking for exactly that a month ago. Couldn't find anything that would let me do that. If we know any GCC experts it would be a good idea to consult them though.
@jart For Tensorflow then, I would expect a persistent compiler (e.g. GCC) process to be another (better) option. This one could cache include file locations on previous stat
calls.
That being said - do not know whether there is a variant like that.
A quick Google found this however (at least interesting): http://clang.llvm.org/doxygen/classclang_1_1FileSystemStatCache.html
Perhaps cc @DanAlbert?
Not sure what I can offer here.
@DanAlbert Maybe you could enlighten us whether there is a persistent gcc/clang compiler, with advanced caching of includes, etc?
I think Bazel should not put the remote packages under external, but just under the name of the remote repository. So for zlib, the files would be under zlib/. That way, you can make it
#include <zlib/zlib.h>
or
#include "zlib/zlib.h"
Ideally, upstream would install the files under /usr/lib/include/zlib/ instead of mashing everything into a single directory, which seems very non-ideal if you ask me. If they don't want to do that (backwards compat?), maybe they can at least provide an optional mechanism?
I certainly haven't heard of anything like it. Aren't modules supposed to be the solution for this?
Yes but when are modules actually going to happen? And will they require existing codebases to be rewritten in order to reap the benifits? https://youtu.be/ND-TuW0KIgg
Yes but when are modules actually going to happen?
Depending on when the different directions of Microsoft and Clang are resolved. Soonest seems C++20. And when I look at the pace that XCode ships Clang variants it way well take until 2022 on Mac. That is actually the reason why I thought a persistent gcc compiler may exist. Because modules will not be a solution for cross platform for a LONG time.
And will they require existing codebases to be rewritten in order to reap the benifits?
To a certain extend yes. Seems more rewrite for Microsoft (<- am prefering) than Clang approach.
Aren't header maps solving this problem? We have an external contribution at https://github.com/bazelbuild/bazel/pull/3712. Could you take a look at it and comment if it will help your use case?
Another use case for implicit -D__BAZEL__
is code which, when built with Bazel, refers to auxiliary files from a Bazel-generated runfiles tree.
#ifdef __BAZEL__
# include "tools/cpp/runfiles/runfiles.h"
#endif
// ...
int main(int argc, char *argv[]) {
#ifdef __BAZEL__
std::string error;
std::unique_ptr<Runfiles> runfiles(Runfiles::Create(argv[0], &error));
if (runfiles == nullptr) { /* ... */ }
std::string path = runfiles->Rlocation("my_workspace/path/to/my/data.txt");
#else
std::string path("some/legacy/path/to/my/data.txt");
#endif // ifdef __BAZEL__
With Bazel 6, all Bazel cc_*
rules define BAZEL_CURRENT_REPOSITORY
to the name of the repository containing the target, which can be used for https://github.com/bazelbuild/bazel/issues/2160#issuecomment-487033719. Relying on it for https://github.com/bazelbuild/bazel/issues/2160#issue-192892490 would seem a bit hacky though.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please post @bazelbuild/triage
in a comment here and we'll take a look. Thanks!