custom-op
custom-op copied to clipboard
undefined symbol error on building official example with GPU support with bazel
Hi, I have problems on building the official example with GPU support with bazel in tensorflow source folder. I use tf_custom_op_library rule defined in //tensorflow:tensorflow.bzl using nvcc + g++ works fine. Any ideas to solve the problem? Or just use the bazel rules in this repo?
Here is my code:
I put this folder under /tensorflow/tensorflow
cd [path to]/tensorflow
bazel build tensorflow/custom_op:test_op.so
//remembered to using the right .so path
// comment load_lib from ./test_op,so, uncomment ./bazel-bin......
python3 ./tensorflow/custom_op/test.py
got errors:
Traceback (most recent call last):
File "tensorflow/test/test.py", line 2, in <module>
test_op = tf.load_op_library('bazel-bin/tensorflow/test/test_op.so')
File "/home/wendyh/.local/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: bazel-bin/tensorflow/test/test_op.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS8_EE
If I use bash make.sh and test it(using the right .so path), it output desired result.
Env: tensorflow master branch Ubuntu 16.04 Nvidia cuda toolkit 10.0
Thank you.
Sounds like you are building TF from source? Which gcc version were you using? Could you try add -D_GLIBCXX_USE_CXX11_ABI=0 flag?
Hi @yifeif , sorry for late reply. yes, I am building it from source(the folder inside TF source folder). my GCC version is 5.4.0 I added but it still not working. I am wondering if the tf_custom_op_library links the header or dependent files properly.
bazel build --config opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow/custom_op/test_op.so
and when I loaded it in python, it got the same error.
Could you try linking against the shared libraries in TensorFlow's pip package instead? I have an example in progress at https://github.com/tensorflow/custom-op/pull/10 if you are interested in taking a look.
@wendy2003888 Try adding --linkopt=-rdynamic to the bazel build
@yifeif @lleewwiiss will try and reply by the end of today. Thank you
@lleewwiiss not working. @yifeif I tried to use python lib and cc_library rule as this repo it works. But I didn’t use it in tf source folder. So tf_custom_op_library rule is still not working. I will try again on weekend. Will check how to link pip lib in tf source build file.
@wendy2003888 Can you try this build file with that flag? I don't currently have a setup where i can test it but i believe you should be using tf_kernel_library.
package(default_visibility = ["//visibility:public"])
load("//tensorflow:tensorflow.bzl", "tf_kernel_library")
tf_kernel_library(
name='test_op.so',
srcs=['kernel_example.cc', 'kernel_example.h'],
gpu_srcs = ["kernel_example.cu.cc"],
visibility = ["//visibility:public"],
alwayslink=1,
deps = [
"//tensorflow/core:framework",
"//tensorflow/core:lib",
],
)
@lleewwiiss I am using tf_kernel_library and tf_cc_test. They always work. Just want to find why tf_custom_op_library is not working. This rule is on official guide. And it supposes to support custom libraries.
Have you always been using gcc 5.4.0. Could you try switching to 4.8?
@yifeif I downgrade gcc and g++ to 4.8 on my personal desktop. The undefined symbol error still occurred. So the error may not be linked to the gcc and g++ version.
Anyway, using python binary and nvcc directly can build gpu version custom ops.
Finally, sorry for my sooo late reply due to traveling and moving.
Thank you. :)
I have some similar problems, I wrote a custom op, and use gcc cmd to build it, it worked well. Then I try to rewirte the build script with cmake, but when calling the op, it gives the undefined symbol error. It seems the op is stripped by the cmake. The following is gcc script and cmakelists. could you give some advice? gcc script
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
nvcc -std=c++11 -c -o resize_trilinear.cu.o resize_trilinear.cu.cc \
${TF_CFLAGS[@]} -I/usr/local \
-D GOOGLE_CUDA=1 \
-x cu -Xcompiler -fPIC -DNDEBUG --expt-relaxed-constexpr
g++ -std=c++11 -shared -o libresize_trilinear.so resize_trilinear.cc resize_trilinear.cu.o \
${TF_CFLAGS[@]} -I/usr/local/cuda/include \
${TF_LFLAGS[@]} -lcudart \
-D GOOGLE_CUDA=1 \
-fPIC -O2
CMakeLists.txt
cmake_minimum_required(VERSION 3.14)
project(resize_trilinear)
set(CMAKE_CXX_STANDARD 11)
#enable_language(CUDA)
find_package(CUDA REQUIRED)
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} \
-gencode arch=compute_61,code=sm_61 \
-D GOOGLE_CUDA=1 -x cu \
-Xcompiler -fPIC -DNDEBUG \
--expt-relaxed-constexpr")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} \
-std=c++11 \
-DGOOGLE_CUDA=1 \
-fPIC -O2")
set(TF_INC /home/zoud/program/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow_core/include)
set(TF_LIB /home/zoud/program/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow_core)
cuda_compile(RESIZE_TRILINEAR_CU_O resize_trilinear.cu.cc MODULE OPTIONS -I$TF_INC -I/usr/local)
include_directories(${TF_INC} /usr/local/cuda/include)
link_directories(${TF_LIB} /usr/local/cuda/lib64)
#add_link_options(-Wl,--no-as-needed)
#add_link_options(-Wl,--allow-multiple-definition)
add_library(resize_trilinear SHARED
${RESIZE_TRILINEAR_CU_O}
resize_trilinear.h
resize_trilinear.cc)
target_link_libraries(resize_trilinear
libtensorflow_framework.so.2
libcudart.so)