tensorflow-on-raspberry-pi icon indicating copy to clipboard operation
tensorflow-on-raspberry-pi copied to clipboard

Bazel compile failure at ./compile.sh

Open lanewinfield opened this issue 7 years ago • 13 comments

Describe the Issue

After installing all the prerequisites and creating a swap file on an external drive (another sd card in a USB adapter), Bazel fails to build in multiple places.

Steps to Reproduce

Install all prerequisites and make swap file.

Run through Bazel preparations as per the GUIDE.

Run sudo ./compile.sh

Hardware/Software Info

Please provide the following information about your Raspberry Pi setup:

  • Raspberry Pi model: 3B
  • Operating System used: Raspbian GNU/Linux 8.0 (jessie)
  • Version of Python used: Python 2.7.9
  • SD card memory size: 16GB
  • Size of USB/other device used as swap (if building from source): 16GB
  • TensorFlow git commit hash (if building from source):

Relevant Console Output/Logs

First off, an output of free:

             total       used       free     shared    buffers     cached
Mem:        947732     268624     679108       6256      27632     142968
-/+ buffers/cache:      98024     849708
Swap:     31265272        408   31264864

Example 1:

[44 / 408] Still waiting for 70 jobs to complete:
      Running (standalone):
        Compiling src/main/tools/linux-sandbox.cc, 19 s
        Compiling src/main/tools/linux-sandbox-pid1.cc, 18 s
        Compiling src/main/tools/build-runfiles.cc, 15 s
      Running (unknown):
        Expanding template src/java_tools/junitrunner/java/com/google/testing/\
coverage/JacocoCoverage, 14 s
ERROR: /home/pi/tf/bazel/third_party/protobuf/3.0.0/BUILD:211:1: C++ compilation of rule '//third_party/protobuf/3.0.0:protoc_lib' failed: gcc failed: error executing command 
  (cd /tmp/bazel_UfskJvmc/out/execroot/bazel && \
  exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++0x' -MD -MF bazel-out/host/bin/third_party/protobuf/3.0.0/_objs/protoc_lib/third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.d '-frandom-seed=bazel-out/host/bin/third_party/protobuf/3.0.0/_objs/protoc_lib/third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.o' -iquote . -iquote bazel-out/host/genfiles -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem third_party/protobuf/3.0.0/src -isystem bazel-out/host/genfiles/third_party/protobuf/3.0.0/src -isystem external/bazel_tools/tools/cpp/gcc3 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare '-Wno-error=unused-function' '-Wno-error=unused-variable' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc -o bazel-out/host/bin/third_party/protobuf/3.0.0/_objs/protoc_lib/third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from /usr/include/c++/4.8/vector:64:0,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_helpers.h:35,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_field.h:36,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.h:36,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc:34:
/usr/include/c++/4.8/bits/stl_vector.h:56:0: error: unterminated #ifndef
 #ifndef _STL_VECTOR_H
 ^
In file included from /usr/include/c++/4.8/vector:69:0,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_helpers.h:35,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_field.h:36,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.h:36,
                 from third_party/protobuf/3.0.0/src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc:34:
/usr/include/c++/4.8/bits/vector.tcc:295:53: error: no 'std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::emplace(std::vector<_Tp, _Alloc>::iterator, _Args&& ...)' member function declared in class 'std::vector<_Tp, _Alloc>'
       emplace(iterator __position, _Args&&... __args)
                                                     ^
Target //src:bazel failed to build
INFO: Elapsed time: 253.329s, Critical Path: 121.67s

ERROR: Could not build Bazel

Example 2:

[151 / 1,229] Still waiting for 130 jobs to complete:
      Running (standalone):
        Compiling src/main/tools/linux-sandbox.cc, 15 s
        Compiling src/main/cpp/util/strings.cc, 15 s
        Compiling src/main/cpp/util/numbers.cc, 14 s
      Running (unknown):
        Writing file src/java_tools/buildjar/java/com/google/devtools/build/ja\
va/turbine/javac/libjavac_turbine.jar-2.params, 12 s
        Writing file src/java_tools/buildjar/java/com/google/devtools/build/ja\
va/turbine/javac/libjavac_turbine_java_compiler.jar-2.params, 12 s
        Writing file src/java_tools/buildjar/java/com/google/devtools/build/ja\
Slow read: a 7283172-byte read from /home/pi/tf/bazel/third_party/hazelcast/hazelcast-3.6.4.jar took 12087ms.
Slow read: a 3288880-byte read from /home/pi/tf/bazel/third_party/netty/netty-all-4.1.0.CR6.jar took 5424ms.
INFO: From Compiling third_party/ijar/ijar.cc [for host]:
third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)':
third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (filename_len >= CLASS_EXTENSION_LENGTH) {
                       ^
ERROR: /home/pi/tf/bazel/third_party/grpc/BUILD:54:1: C++ compilation of rule '//third_party/grpc:grpc_unsecure' failed: gcc failed: error executing command 
  (cd /tmp/bazel_pi0DozYA/out/execroot/bazel && \
  exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/local-opt/bin/third_party/grpc/_objs/grpc_unsecure/third_party/grpc/src/core/compression/algorithm.d -iquote . -iquote bazel-out/local-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local-opt/genfiles/external/bazel_tools -isystem third_party/grpc -isystem bazel-out/local-opt/genfiles/third_party/grpc -isystem third_party/grpc/include -isystem bazel-out/local-opt/genfiles/third_party/grpc/include -isystem third_party/zlib -isystem bazel-out/local-opt/genfiles/third_party/zlib -isystem external/bazel_tools/tools/cpp/gcc3 '-std=gnu99' -Wno-implicit-function-declaration -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c third_party/grpc/src/core/compression/algorithm.c -o bazel-out/local-opt/bin/third_party/grpc/_objs/grpc_unsecure/third_party/grpc/src/core/compression/algorithm.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
third_party/grpc/src/core/compression/algorithm.c:1:0: internal compiler error: Segmentation fault
 /*
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
Target //src:bazel failed to build
INFO: Elapsed time: 395.369s, Critical Path: 264.77s

ERROR: Could not build Bazel

lanewinfield avatar Apr 20 '17 00:04 lanewinfield

Just tried again from a completely fresh install of Raspbian. Got this:

[230 / 1,192] Writing file src/main/java/com/google/devtools/build/lib/rules/o\
bjc/libobjc.jar-2.params
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (os_linux_zero.cpp:254), pid=1784, tid=1461711968
#  fatal error: caught unhandled signal 11
#
# JRE version: OpenJDK Runtime Environment (8.0_40-b04) (build 1.8.0_40-internal-b04)
# Java VM: OpenJDK Zero VM (25.40-b08 interpreted mode linux-arm )
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/pi/tf/bazel/hs_err_pid1784.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
scripts/bootstrap/compile.sh: line 316:  1784 Aborted                 "${JAVA_HOME}/bin/java" -XX:+HeapDumpOnOutOfMemoryError -Xverify:none -Dfile.encoding=ISO-8859-1 -XX:HeapDumpPath=${OUTPUT_DIR} -Djava.util.logging.config.file=${OUTPUT_DIR}/javalog.properties ${JNI_FLAGS} -jar ${ARCHIVE_DIR}/libblaze.jar --batch --install_base=${ARCHIVE_DIR} --output_base=${OUTPUT_DIR}/out --install_md5= --workspace_directory=${PWD} --nofatal_event_bus_exceptions ${BAZEL_DIR_STARTUP_OPTIONS} ${BAZEL_BOOTSTRAP_STARTUP_OPTIONS:-} $command --ignore_unsupported_sandboxing --startup_time=329 --extract_data_time=523 --rc_source=/dev/null --isatty=1 --ignore_client_env --client_cwd=${PWD} "${@}"

ERROR: Could not build Bazel

lanewinfield avatar Apr 20 '17 02:04 lanewinfield

Thanks for opening this up @lanewinfield - I'll have to try and build Bazel from scratch again to see if I can replicate. I'm not exactly sure what might have happened, as I completed the guide using a fresh install of Raspbian the last time I updated things. It's possible that I flubbed a command in there.

Due to logistics/timing I'm going to wait until TensorFlow 1.1.0 has gone gold to work on this.

samjabrahams avatar Apr 21 '17 05:04 samjabrahams

@samjabrahams After doing a ton of troubleshooting, I might've discovered my issue—attempting to run all of this headless. I've had it compiling now for about 2 hours without errors (previously it had errored out in <1 hr). I'm going to let it run overnight and report back on what happens!

lanewinfield avatar Apr 21 '17 05:04 lanewinfield

@lanewinfield cool, thanks for following up. Interesting that it would make a difference due to running it headless. That itself is something worth investigating!

samjabrahams avatar Apr 21 '17 05:04 samjabrahams

@samjabrahams yeah, after running it all night, it seems to have just paused. Not errored out, but stuck at a point. Stopped it and restarted it, back to the same old errors. Looks like SSH didn't matter.

The funny thing is, I had this running on this system a couple weeks ago no problem. Then I accidentally bent my SD card putting the RPi into a case and lost everything—since the versions have changed, I can't build anymore! Dang.

lanewinfield avatar Apr 21 '17 17:04 lanewinfield

Alright, back to plan A. Once 1.1.0 is finalized, I'll build from scratch to double check the guide is accurate. Thanks for letting me know!

Sorry to hear about the SD card mishaps :/ Is it possible that the new SD card you bought is a lower speed class than your original?

samjabrahams avatar Apr 21 '17 17:04 samjabrahams

Great, thanks Sam.

Great thought—double checked, using the exact same card type, size, and speed as my old one (Class 10 32GB SanDisk)!

lanewinfield avatar Apr 21 '17 17:04 lanewinfield

Good to know! I'll report back here once I've gone through the guide from scratch.

samjabrahams avatar Apr 21 '17 18:04 samjabrahams

In the meantime, went through the GUIDE from Apr 5 and the 0.4.3 build worked, so I'll use that for now! 👍

lanewinfield avatar Apr 22 '17 14:04 lanewinfield

I just compiled Bazel from source using the guide, but without using a USB swap drive, without installing GCC 4.8, and without invoking compile.sh with sudo. (Why do you even instruct people to use sudo to compile? That seems like a bad thing.) Anyway, compilation went off without a hitch.

logological avatar Apr 27 '17 06:04 logological

Hi @logological - thanks for the data point. The sudo issue has been brought up on multiple occasions- unfortunately, the last time I tried to switch it out some commands things went a bit haywire. Ideally we come up with a way to remove sudo commands from anything involving a pip or bash command while maintaining functionality.

I've opened up an issue (#96) to discuss this further.

samjabrahams avatar Apr 29 '17 19:04 samjabrahams

@lanewinfield - I went through the entire guide using a fresh install of Raspbian without any issues. Trying to think of what might be causing differences. Do you know what specific sub-release of Raspbian Jessie you're using? Alternatively, if you used NOOBS to install Raspbian, do you know which version of NOOBS you used?

samjabrahams avatar Apr 30 '17 19:04 samjabrahams

Actually- let's try switching back to the Oracle JDK instead of OpenJDK.

sudo update-alternatives --config java

Select the option for Oracle JDK 8 (I just remembered that I made this switch for the most recent release).

samjabrahams avatar Apr 30 '17 19:04 samjabrahams