pointnet2 Error when training

Hello!

I was looking here if anyone has found the same error as I did, but couldn't find anything.

I am trying to train the semantic segmentation trainning code, with my own dataset.

I was able to follow properly the readme, install all the dependencies and compile all the tf_ops SOs, but when I try to run the trainning, I get an error on the tf_sampling_so.so of undefined symbol (I've attached the image with the error):

IMG_20190312_145653 1

Has anyone seen this kind of problem and knows how to solve it? I'm using Cuda 9.0 and TF 1.13

Thanks!

Mar 12 '19 19:03 pauloffsf

I have faced similar problem. tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv I'm using Cuda9.0 and TF 1.12

Mar 14 '19 03:03 Kevinlongran

I thought that maybe I used the unmatched tf version.

Mar 14 '19 03:03 Kevinlongran

Maybe you can find the solution by following https://github.com/charlesq34/pointnet2/issues/48

Mar 15 '19 05:03 zhangxing1995

Hello, I'm using TF-gpu 1.13.0 and Cuda10.0 on Ubuntu 18.04 and i still have this problem when i run train.py. is this problem being produced because of TF and Cuda version ? PS : I couldn't use Cuda 9.0 on my Ubuntu version because of g++ version (> 6.0).

May 10 '19 14:05 SalaheddineSTA

Hello i am facing similar issue tensorflow.python.framework.errors_impl.NotFoundError: /mnt/disks/user/project/pointnet2/tf_ops/sampling/tf_sampling_so.so: undefined symbol: _ZTVN10tensorflow14kernel_factory17OpKernelRegistrar18PtrOpKernelFactoryE with gcc > 6 i tried to solve with above mentioned #48 but i am getting the same error

Jul 31 '19 13:07 kiranintellify

@pauloffsf @Kevinlongran @zhangxing1995 @SalaheddineSTA @kiranintellify have you solve the problem? i met the same problem as you. i uses Cuda9.0 and TF 1.12

Nov 19 '19 09:11 MrCrazyCrab

I was able to solve it, a while back, but I lost my tutorial with all the steps I did to solve it. But it was mostly the way you compile the c/c++ codes.

Nov 19 '19 10:11 pauloffsf

@pauloffsf i can successfully in compile the codes, there is something wrong when i load the .so files. I f it's ok, could you share your .so files fo me?

Nov 20 '19 01:11 MrCrazyCrab

It is related with the compiling options you use. I don't have the .so anymore. they were also with the tutorial I created. The HD of the computer had a problem and they had to format the computer. After that, I was no longer able to work with the code again.

I also used this https://github.com/pubgeo/dfc2019/tree/master/track4/pointnet2 to help me compile.

I am going to work again with this code in 2 weeks from now. If I come up with how to solve it again, I'll let you know.

Nov 20 '19 02:11 pauloffsf

@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?

I could solve it with this:

Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:

you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:

$ldd tf_grouping_so.so (for example)

check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).

This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:

$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.

If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig

That's it.

you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py

I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1

Apr 02 '20 23:04 pauloffsf

@pauloffsf Thanks all the same, i have slolved the problem. i made it by setting up the right virtual environment. However, I couldn't succeed in the tensorflow1.14 beacuse the file *.so.x , and you suggestion would be a solution to that.

Apr 03 '20 00:04 MrCrazyCrab

@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?

I could solve it with this:

Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:

you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:

$ldd tf_grouping_so.so (for example)

check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).

This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:

$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.

If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig

That's it.

you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py

I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1

Hello, I have a problem in the implementation process according to your method: using ldd tf_distance_so.so, there is no libtensorflow_framework.so in the output list, and I have not found a version similar to libtensorflow_framework.so.x in the library, I hope you can take the time to see what the problem is. Thank you! ! ! My environment is as follows: ubuntu16.04; ubuntu-drivers 440.64; cuda 10.0; cudnn 7.5.0; tf 1.10.0; gcc/g++ 5.4; Thans in advance!!!!!

Jun 20 '20 09:06 skq-5233

@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?

I could solve it with this:

Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:

you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:

$ldd tf_grouping_so.so (for example)

check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).

This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:

$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.

If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig

That's it.

you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py

I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1

my makefile as follows:

Jun 20 '20 09:06 skq-5233

Have you check the if the libtensorflow is in the folder \home\user\anaconda2\envs\planenet\lib\python2.7(...)\tensorflow ? that's where it should be and usually it is *.so.x where x is another number.

What tensorflow version have you installed?

Jun 20 '20 10:06 pauloffsf

This is libtensorflow's path :

------------------ 原始邮件 ------------------ 发件人: "pauloffsf"<[email protected]>; 发送时间: 2020年6月20日(星期六) 晚上6:43 收件人: "charlesq34/pointnet2"<[email protected]>; 抄送: "Dandelion's Fled"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [charlesq34/pointnet2] Error when training (#111)

Have you check the if the libtensorflow is in the folder \home\user\anaconda2\envs\planenet\lib\python2.7(...)\tensorflow ? that's where it should be and usually it is *.so.x where x is another number.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jun 20 '20 13:06 skq-5233

pointnet2 pointnet2 copied to clipboard

Error when training

pointnet2
pointnet2 copied to clipboard