custom-op undefined symbol error on building official example with GPU support with bazel

Hi, I have problems on building the official example with GPU support with bazel in tensorflow source folder. I use tf_custom_op_library rule defined in //tensorflow:tensorflow.bzl using nvcc + g++ works fine. Any ideas to solve the problem? Or just use the bazel rules in this repo?

Here is my code:

I put this folder under /tensorflow/tensorflow

cd [path to]/tensorflow
bazel build tensorflow/custom_op:test_op.so
//remembered to using the right .so path
// comment load_lib from ./test_op,so, uncomment ./bazel-bin......
python3 ./tensorflow/custom_op/test.py

got errors:

Traceback (most recent call last):
  File "tensorflow/test/test.py", line 2, in <module>
    test_op = tf.load_op_library('bazel-bin/tensorflow/test/test_op.so')
  File "/home/wendyh/.local/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 60, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: bazel-bin/tensorflow/test/test_op.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl11string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS8_EE

If I use bash make.sh and test it(using the right .so path), it output desired result.

Env: tensorflow master branch Ubuntu 16.04 Nvidia cuda toolkit 10.0

Thank you.

Feb 27 '19 07:02 wendy2003888

Sounds like you are building TF from source? Which gcc version were you using? Could you try add -D_GLIBCXX_USE_CXX11_ABI=0 flag?

Mar 01 '19 00:03 yifeif

Hi @yifeif , sorry for late reply. yes, I am building it from source(the folder inside TF source folder). my GCC version is 5.4.0 I added but it still not working. I am wondering if the tf_custom_op_library links the header or dependent files properly.

bazel build --config opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow/custom_op/test_op.so

and when I loaded it in python, it got the same error.

Mar 02 '19 19:03 wendy2003888

Could you try linking against the shared libraries in TensorFlow's pip package instead? I have an example in progress at https://github.com/tensorflow/custom-op/pull/10 if you are interested in taking a look.

Mar 04 '19 18:03 yifeif

@wendy2003888 Try adding --linkopt=-rdynamic to the bazel build

Mar 06 '19 22:03 lleewwiiss

@yifeif @lleewwiiss will try and reply by the end of today. Thank you

Mar 07 '19 17:03 wendy2003888

@lleewwiiss not working. @yifeif I tried to use python lib and cc_library rule as this repo it works. But I didn’t use it in tf source folder. So tf_custom_op_library rule is still not working. I will try again on weekend. Will check how to link pip lib in tf source build file.

Mar 08 '19 07:03 wendy2003888

@wendy2003888 Can you try this build file with that flag? I don't currently have a setup where i can test it but i believe you should be using tf_kernel_library.


package(default_visibility = ["//visibility:public"])
load("//tensorflow:tensorflow.bzl", "tf_kernel_library")


tf_kernel_library(
    name='test_op.so',                                                                                        
    srcs=['kernel_example.cc', 'kernel_example.h'],
    gpu_srcs = ["kernel_example.cu.cc"],
    visibility = ["//visibility:public"],
    alwayslink=1,
    deps = [
         "//tensorflow/core:framework",
         "//tensorflow/core:lib",
     ],
)

Mar 08 '19 08:03 lleewwiiss

@lleewwiiss I am using tf_kernel_library and tf_cc_test. They always work. Just want to find why tf_custom_op_library is not working. This rule is on official guide. And it supposes to support custom libraries.

Mar 10 '19 04:03 wendy2003888

Have you always been using gcc 5.4.0. Could you try switching to 4.8?

Apr 30 '19 22:04 yifeif

@yifeif I downgrade gcc and g++ to 4.8 on my personal desktop. The undefined symbol error still occurred. So the error may not be linked to the gcc and g++ version. Anyway, using python binary and nvcc directly can build gpu version custom ops. Finally, sorry for my sooo late reply due to traveling and moving. Thank you. :)

May 05 '19 17:05 wendy2003888

I have some similar problems, I wrote a custom op, and use gcc cmd to build it, it worked well. Then I try to rewirte the build script with cmake, but when calling the op, it gives the undefined symbol error. It seems the op is stripped by the cmake. The following is gcc script and cmakelists. could you give some advice? gcc script

TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') ) 
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') ) 

nvcc -std=c++11 -c -o resize_trilinear.cu.o resize_trilinear.cu.cc \
    ${TF_CFLAGS[@]} -I/usr/local \
    -D GOOGLE_CUDA=1 \
    -x cu -Xcompiler -fPIC -DNDEBUG --expt-relaxed-constexpr

g++ -std=c++11 -shared -o libresize_trilinear.so resize_trilinear.cc resize_trilinear.cu.o \
    ${TF_CFLAGS[@]} -I/usr/local/cuda/include \
    ${TF_LFLAGS[@]} -lcudart \
    -D GOOGLE_CUDA=1 \
    -fPIC -O2

CMakeLists.txt

cmake_minimum_required(VERSION 3.14)

project(resize_trilinear)

set(CMAKE_CXX_STANDARD 11)

#enable_language(CUDA)
find_package(CUDA REQUIRED)

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} \
                    -gencode arch=compute_61,code=sm_61 \
                    -D GOOGLE_CUDA=1 -x cu \
                    -Xcompiler -fPIC -DNDEBUG \
                    --expt-relaxed-constexpr")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} \
                    -std=c++11 \
                    -DGOOGLE_CUDA=1 \
                    -fPIC -O2")

set(TF_INC /home/zoud/program/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow_core/include)
set(TF_LIB /home/zoud/program/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow_core)

cuda_compile(RESIZE_TRILINEAR_CU_O resize_trilinear.cu.cc MODULE OPTIONS -I$TF_INC -I/usr/local)

include_directories(${TF_INC} /usr/local/cuda/include)

link_directories(${TF_LIB} /usr/local/cuda/lib64)

#add_link_options(-Wl,--no-as-needed)
#add_link_options(-Wl,--allow-multiple-definition)

add_library(resize_trilinear SHARED
        ${RESIZE_TRILINEAR_CU_O}
        resize_trilinear.h
        resize_trilinear.cc)

target_link_libraries(resize_trilinear
        libtensorflow_framework.so.2
        libcudart.so)

Nov 10 '19 08:11 7oud

custom-op custom-op copied to clipboard

undefined symbol error on building official example with GPU support with bazel

custom-op
custom-op copied to clipboard