tensorflow_ros
tensorflow_ros copied to clipboard
Error when linking
Hi there,
Thank you for the awesome work! I have successfully built the package with Python 3.6 and TF 1.4. (after some minor changes). When including it in a simple example, the linking fails with the following error:
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `tensorflow::core::RefCounted::~RefCounted()':
test_tensorflow.cc:(.text._ZN10tensorflow4core10RefCountedD2Ev[_ZN10tensorflow4core10RefCountedD5Ev]+0xbd): undefined reference to `tensorflow::internal::LogMessageFatal::LogMessageFatal(char const*, int)'
test_tensorflow.cc:(.text._ZN10tensorflow4core10RefCountedD2Ev[_ZN10tensorflow4core10RefCountedD5Ev]+0xde): undefined reference to `tensorflow::internal::LogMessageFatal::~LogMessageFatal()'
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `std::string* tensorflow::internal::MakeCheckOpString<long, int>(long const&, int const&, char const*)':
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x24): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::CheckOpMessageBuilder(char const*)'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x4b): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::ForVar2()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x66): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::NewString()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x75): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::~CheckOpMessageBuilder()'
test_tensorflow.cc:(.text._ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc[_ZN10tensorflow8internal17MakeCheckOpStringIliEEPSsRKT_RKT0_PKc]+0x89): undefined reference to `tensorflow::internal::CheckOpMessageBuilder::~CheckOpMessageBuilder()'
collect2: error: ld returned 1 exit status
[0mCMakeFiles/test_tensorflow.dir/build.make:88: recipe for target '/ws/devel/lib/example/test_tensorflow' failed
make[2]: *** [/ws/devel/lib/example/test_tensorflow] Error 1
CMakeFiles/Makefile2:249: recipe for target 'CMakeFiles/test_tensorflow.dir/all' failed
make[1]: *** [CMakeFiles/test_tensorflow.dir/all] Error 2
Makefile:126: recipe for target 'all' failed
make: *** [all] Error 2
Could this be due to libtensorflow_cc.so
missing ? (as mentioned in https://github.com/tensorflow/tensorflow/issues/2412#issuecomment-374147507). If so, does this mean that linking against the pip-installed TF is a dead-end ?
Cheers
Hi, just as a quick try: can you try it with TF 1.3, or do you need 1.4? I remember there were some problems with 1.4, they're changing the layout of the files all the time...
And please specify the operating system and ROS version you're trying on.
Thanks for the quick reply. I am using CentOS 7.4 and the latest version of Catkin (without ROS).
Its turns out that the binaries are built in Release mode, so -DCMAKE_BUILD_TYPE=Release
is required. I now face some other issues:
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `main':
test_tensorflow.cc:(.text.startup+0x3e): undefined reference to `tensorflow::SessionOptions::SessionOptions()'
test_tensorflow.cc:(.text.startup+0x52): undefined reference to `tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**)'
test_tensorflow.cc:(.text.startup+0x69): undefined reference to `tensorflow::Status::ToString() const'
CMakeFiles/test_tensorflow.dir/src/test_tensorflow.cc.o: In function `tensorflow::SessionOptions::~SessionOptions()':
test_tensorflow.cc:(.text._ZN10tensorflow14SessionOptionsD2Ev[_ZN10tensorflow14SessionOptionsD5Ev]+0xd): undefined reference to `tensorflow::ConfigProto::~ConfigProto()'
collect2: error: ld returned 1 exit status
[0mCMakeFiles/test_tensorflow.dir/build.make:88: recipe for target '/ws/devel/lib/example/test_tensorflow' failed
CMakeFiles/Makefile2:249: recipe for target 'CMakeFiles/test_tensorflow.dir/all' failed
make[2]: *** [/ws/devel/lib/example/test_tensorflow] Error 1
make[1]: *** [CMakeFiles/test_tensorflow.dir/all] Error 2
Makefile:126: recipe for target 'all' failed
make: *** [all] Error 2
It fails the same way with TF 1.3 and 1.7. https://github.com/tensorflow/tensorflow/issues/14632#issuecomment-345358750 suggests that these symbols are in libtensorflow_cc.so
, which is not included in the last pip binaries of the versions I tried. Maybe it previously was ?
No, I think libtensorflow_cc.so
should definitely not be needed, because the only way to get this library is to either download it from 3rd party sources, or to build it yourself. Google doesn't distribute it.
At my computer with TF 1.3, the _pywrap_tensorflow
library definitely contains these symbols you report as undefined. Can you check at your system?
$ nm -CD lib_pywrap_tensorflow.so | grep NewSession
0000000000ff4c60 T TF_NewSession
0000000000ff2d00 T TF_NewSessionOptions
00000000029f25c0 T tensorflow::NewSession(tensorflow::SessionOptions const&)
00000000029f26b0 T tensorflow::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**)
0000000000ec7660 W tensorflow::GrpcSessionFactory::NewSession(tensorflow::SessionOptions const&)
0000000002996e40 W tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&)
One more thing to check is if you're not building this package with C++11 ABI enabled, because all the pip-installed libraries are built using the old ABI. If you point the above nm
command to another library/executable file built in the same project, it uses C++11 ABI iff grepping for __cxx11
yields some results.
I'm running into a (similar?) issue with TF 1.7 that I compiled myself (necessary to do for the NVIDIA drivers I'm running). Do we have to build TF with C++11 ABI enabled, tensorflow_ros package with C++11 ABI enabled, or both?
@moorage Highly depends on the system where you're running the package. Recently I was compiling ROS indigo on Debian Stretch, and found out that all the system libraries are using the new ABI, so I needed to recompile everything with the new ABI. So I would say it's best to compile everything with the same ABI your system packages use.
Would "recompile everything" include ros as well (kinetic in our case)?
If you're on Ubuntu 16.04, ROS Kinetic from the official repos is compiled with the new ABI. So it should be sufficient to compile TF with the new ABI. But I'm not sure if the pip-wheel bazel target doesn't automatically select the old ABI, because (AFAIK) all pip libraries are still being compiled with the old ABI.
I'm now working on proper support for custom builds of TF using the libtensorflow_cc.so library. You can try waiting for it.
Anyway - I hope you've noticed there's a kinetic-devel
branch in tensorflow_ros_test
. This branch shows how to make a workaround for the case TF and ROS ABI differ.
That would apply for pip-installed TF on Xenial, for example, or custom compiled TF with the old ABI on Xenial. If you compiled TF yourself with the new ABI (which I'm not sure is possible with the pywrap_tensorflow library), you should go with the master branch.
@peci1 I indeed have these symbols in _pywrap_tensorflow
, and it seems that I'm using the old ABI.
Turns out that I had accidentally commented out
LIBRARIES ${TENSORFLOW_LIBRARIES} python2.7 # yes, we also need to link against python...
Changing python2.7
to python3
does the trick. Thanks!
I ended up creating a Catkin package to build Tensorflow from source using the official CMake build.
@Skydes nice!! Did you use tensorflow_ros
and/or tensorflow_ros_test
with that catkin package? If so, which branch(es)?
@moorage No, it's doesn't use the pip wheel but builds from source instead.
@Skydes @moorage I did a lot of improvements to the code of this package, and I also improved the documentation. Now the package is more like an interface to TF installed in various ways (including your catkin package). You can give it a try if it fixed your issues.
In the documentation, I also tried to describe what exactly are the C++ ABI problems, so it might help you understanding if you're trying the right thing.
Thanks a lot for the improvements! It seems much more usable now. I'm facing the C++ ABI problem, so I don't really need tensorflow_ros
(for now at least) as the catkin package provides the interface I need.
FYI: I'm working on scrapping out unnecessary GRPC/Python dependencies of tensorflow_catkin
, so its build time should be reduced.
Great. I had some problems building with tensorflow_catkin, which I solved by numerous hacks (mainly regarding gomp library, grpc and jpeg). Next week I'll try to summarize what I needed to do to actually succeed compiling on a pretty blank Xenial machine.
Anyways, the idea with the new version of tensorflow_ros is to be able to create ROS packages, that leave their users freedom in the way how they "supply" tensorflow. Somebody wants to compile, somebody is good with the pip version, somebody has a bazel build already...