runtime icon indicating copy to clipboard operation
runtime copied to clipboard

build fails with clang / GCC-8 on CentOS 8.3 - stddef.h not found

Open bondhugula opened this issue 4 years ago • 2 comments

This is with git c4262eb68366c705aec08fdc0b20b54d58dcdb19 as of Apr 23. The build fails with the error below although the pre-requisites mentioned in the README are met: clang-11 and GCC-8. It looks like using clang-11 with libstdc++-8 isn't sufficient here. More details below.

bazel build //tools:tfrt_translate fails with:

INFO: Analyzed target //tools:tfrt_translate (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/uday/.cache/bazel/_bazel_uday/0a58b4eb202e02a9cdf9e8f6aa753460/external/zlib/BUILD.bazel:5:11: Compiling uncompr.c [for host] failed: (Exit 1): clang failed: error executing command /usr/lib64/ccache/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 23 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox clang failed: error executing command /usr/lib64/ccache/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 23 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from external/zlib/uncompr.c:9:
In file included from external/zlib/zlib.h:34:
external/zlib/zconf.h:247:14: fatal error: 'stddef.h' file not found
#    include <stddef.h>
             ^~~~~~~~~~
clang++ -v |& grep "Selected GCC"
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/8
gcc --version
gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rpm -qa | grep libstdc++
libstdc++-devel-8.3.1-5.1.el8.x86_64
libstdc++-8.3.1-5.1.el8.x86_64

CentOS Linux release 8.3.2011

bazel --version bazel 4.0.0

I can reproduce the same issue after updating clang-11.1.0 (the version mentioned in the tensorflow/runtime README) as well:

clang --version
clang version 11.1.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

bondhugula avatar Apr 24 '21 04:04 bondhugula

This is really an issue only when using a clang built and installed separately at /opt/clang-11.1.0/ - it can't find the headers needed. This version of clang was itself built using the above mentioned gcc version and its libstdc++. When using the clang installed at the standard system path (clang 10.0.1 from the CentOS repos), the build is fine.

bondhugula avatar Apr 24 '21 08:04 bondhugula

On digging a deeper, here's what explains the source of the problem.

  1. With the -s flag, an actual build command that fails is:
/usr/lib64/ccache/clang  -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/k8-opt/bin/_objs/hostcontext/async_value.d '-frandom-seed=bazel-out/k8-opt/bin/_objs/hostcontext/async_value.o' -DLLVM_ENABLE_STATS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_BUILD_GLOBAL_ISEL -iquote. -iquotebazel-out/k8-opt/bin -iquoteexternal/llvm-project -iquotebazel-out/k8-opt/bin/external/llvm-project -iquoteexternal/zlib -iquotebazel-out/k8-opt/bin/external/zlib -isystem include -isystem bazel-out/k8-opt/bin/include -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/include -isystem external/zlib -isystem bazel-out/k8-opt/bin/external/zlib -isystem third_party/llvm_derived/include -isystem bazel-out/k8-opt/bin/third_party/llvm_derived/include -Wno-unused-local-typedef -U_FORTIFY_SOURCE '-std=c++14' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c lib/host_context/async_value.cc -o bazel-out/k8-opt/bin/_objs/hostcontext/async_value.o
In file included from lib/host_context/async_value.cc:19:
In file included from external/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:35:
In file included from external/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:16:
In file included from external/llvm-project/llvm/include/llvm/Support/Compiler.h:21:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/new:40:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/exception:143:
In file included from /usr/bin/../lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/exception_ptr.h:38:
/usr/bin/../lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/cxxabi_init_exception.h:38:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
         ^~~~~~~~~~
1 error generated.

The clang used at /usr/lib64/ccache/clang is really the one at /opt/clang-11.1.0/bin. From the error above, the way clang is called above, it can't find its own stddef.h which is actually located at /opt/clang-11.1.0/lib/clang/11.1.0/include/. To verify, all three of these succeed finding the header file:

echo '#include <stddef.h>' | /usr/lib64/ccache/clang  -c  -xc -
echo '#include <stddef.h>' | /usr/bin/clang  -c  -xc -
echo '#include <stddef.h>' | clang  -c  -xc -
echo '#include <stddef.h>' | /opt/clang-11.1.0/bin/clang  -c  -xc -

/usr/bin/clang points to /opt/clang-11.1.0/bin/clang via "alternatives"

Using -v with clang shows that all the above are correctly looking in /opt/clang-11.1.0/lib/clang/11.1.0/include and finding the header. Using -v with the bazel's compile command however reveals that clang is instead looking in /usr/lib/clang-11.1.0/include with those set of compile flags!! Unraveling the mystery, the culprit here in the bazel compile flags is **-no-canonical-prefixes** which makes the compiler's resource dir path relative instead of absolute leading to clang not finding its own include dir in /opt/...! This is also the reason it worked with the standard system clang installation in /usr/ but not with one outside of it. Furthermore, replacing /usr/lib64/cache/clang with /opt/clang-11.1.0/bin/clang in the bazel command leads to correct compilation because even the relative path would be fine in that case. Removing -no-canonical-prefixes from the compile flag leads to correct compilation.

It's not clear why bazel (or something else) thought it was good to add -no-canonical-prefixes to the compile flags. There is nothing in the tensorflow runtime codebase itself that appears to have caused its addition. This is a pretty nasty interaction caused by several design/user choices!

bondhugula avatar Apr 25 '21 14:04 bondhugula