TF_Deformable_Net icon indicating copy to clipboard operation
TF_Deformable_Net copied to clipboard

A question when training

Open xiaowenhe opened this issue 8 years ago • 10 comments

@Zardinality ,thanks for your answer. But the error still again. Even I change the -arch=sm_37 (K80) in make.sh and setup.py, and rerun the make.

xiaowenhe avatar Jul 31 '17 01:07 xiaowenhe

Another error when trainning:

kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test nthu_71 nthu_370 Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 30, in from lib.networks.factory import get_network File "/root/tf_deformable_frcnn/lib/networks/init.py", line 8, in from .VGGnet_train import VGGnet_train File "/root/tf_deformable_frcnn/lib/networks/VGGnet_train.py", line 2, in from .network import Network File "/root/tf_deformable_frcnn/lib/networks/network.py", line 13, in from ..deform_conv_layer import deform_conv_op as deform_conv_op File "/root/tf_deformable_frcnn/lib/deform_conv_layer/deform_conv_op.py", line 8, in _deform_conv_module = tf.load_op_library(filename) File "/usr/local/lib/python/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library None, None, error_msg, error_code) tensorflow.python.framework.errors_impl.NotFoundError: /root/tf_deformable_frcnn/lib/deform_conv_layer/deform_conv.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

Any thoughts?

paulcx avatar Jul 31 '17 05:07 paulcx

@xiaowenhe I was aware about your problem, please don't reopen two same issues.

Zardinality avatar Jul 31 '17 06:07 Zardinality

@paulcx It might related to gcc version and certain flags. I will add some lines in make.sh and FAQ. Now if you want to fix it instantly, check this issue.

Zardinality avatar Jul 31 '17 06:07 Zardinality

@Zardinality I have tried both solutions but they don't work so far with same error. g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -fPIC -D GOOGLE_CUDA -lcudart -L $CUDA_HOME/lib64 or

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=1 -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -fPIC -D GOOGLE_CUDA -lcudart -L $CUDA_HOME/lib64

Am I right about the solution?

paulcx avatar Jul 31 '17 08:07 paulcx

@paulcx Make sure you use the recompiled version. Or try removing the related flag maybe. Which version of g++ do you use?

Zardinality avatar Jul 31 '17 09:07 Zardinality

@Zardinality What do you mean by using the recomplied version? g++ is 5.40

paulcx avatar Jul 31 '17 09:07 paulcx

@paulcx I mean manually remove original generated file such as .o and .so, then recompile it. Also, since you use g++5(which I didn't have chance to test), you should compile with -D_GLIBCXX_USE_CXX11_ABI=0.

Zardinality avatar Jul 31 '17 10:07 Zardinality

@paulcx Hi, have you worked out where the problem is? I have updated readme to include a workaround given by others in another issue, which solves the same problem.

Zardinality avatar Aug 05 '17 14:08 Zardinality

@Zardinality Not yet. The solution does not work for g++5.4 at least.

paulcx avatar Aug 07 '17 00:08 paulcx

@paulcx check out this , it solved a similar undefined symbol problem for me.

selkerdawy avatar May 03 '18 18:05 selkerdawy