pointnet2
pointnet2 copied to clipboard
Error when training
Hello!
I was looking here if anyone has found the same error as I did, but couldn't find anything.
I am trying to train the semantic segmentation trainning code, with my own dataset.
I was able to follow properly the readme, install all the dependencies and compile all the tf_ops SOs, but when I try to run the trainning, I get an error on the tf_sampling_so.so of undefined symbol (I've attached the image with the error):
Has anyone seen this kind of problem and knows how to solve it? I'm using Cuda 9.0 and TF 1.13
Thanks!
I have faced similar problem. tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv I'm using Cuda9.0 and TF 1.12
I thought that maybe I used the unmatched tf version.
Maybe you can find the solution by following https://github.com/charlesq34/pointnet2/issues/48
Hello,
I'm using TF-gpu 1.13.0
and Cuda10.0
on Ubuntu 18.04
and i still have this problem when i run train.py.
is this problem being produced because of TF and Cuda version ?
PS : I couldn't use Cuda 9.0
on my Ubuntu version because of g++ version (> 6.0).
Hello i am facing similar issue tensorflow.python.framework.errors_impl.NotFoundError: /mnt/disks/user/project/pointnet2/tf_ops/sampling/tf_sampling_so.so: undefined symbol: _ZTVN10tensorflow14kernel_factory17OpKernelRegistrar18PtrOpKernelFactoryE with gcc > 6 i tried to solve with above mentioned #48 but i am getting the same error
@pauloffsf @Kevinlongran @zhangxing1995 @SalaheddineSTA @kiranintellify have you solve the problem? i met the same problem as you. i uses Cuda9.0 and TF 1.12
I was able to solve it, a while back, but I lost my tutorial with all the steps I did to solve it. But it was mostly the way you compile the c/c++ codes.
@pauloffsf i can successfully in compile the codes, there is something wrong when i load the .so files. I f it's ok, could you share your .so files fo me?
It is related with the compiling options you use. I don't have the .so anymore. they were also with the tutorial I created. The HD of the computer had a problem and they had to format the computer. After that, I was no longer able to work with the code again.
I also used this https://github.com/pubgeo/dfc2019/tree/master/track4/pointnet2 to help me compile.
I am going to work again with this code in 2 weeks from now. If I come up with how to solve it again, I'll let you know.
@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?
I could solve it with this:
Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:
you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:
$ldd tf_grouping_so.so (for example)
check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).
- This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:
$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.
- If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig
That's it.
you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py
I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1
@pauloffsf Thanks all the same, i have slolved the problem. i made it by setting up the right virtual environment. However, I couldn't succeed in the tensorflow1.14 beacuse the file *.so.x , and you suggestion would be a solution to that.
@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?
I could solve it with this:
Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:
you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:
$ldd tf_grouping_so.so (for example)
check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).
- This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:
$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.
- If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig
That's it.
you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py
I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1
Hello, I have a problem in the implementation process according to your method: using ldd tf_distance_so.so, there is no libtensorflow_framework.so in the output list, and I have not found a version similar to libtensorflow_framework.so.x in the library, I hope you can take the time to see what the problem is. Thank you! ! ! My environment is as follows:
ubuntu16.04; ubuntu-drivers 440.64; cuda 10.0; cudnn 7.5.0; tf 1.10.0; gcc/g++ 5.4;
Thans in advance!!!!!
@MrCrazyCrab sorry it took me so long to answer, but could you solve your problem?
I could solve it with this:
Besides taking "-D_GLIBCXX_USE_CXX11_ABI=0" parameter of the g++, I got to fix my problem with this:
you need to see if the -ltensorflow_framework was linked properlly in your tf_ops *.so. For that, use:
$ldd tf_grouping_so.so (for example)
check if the libtensorflow_framework.so is in the list (2). If it isn't, you haven't linked it properly (1).
- This may happen if your library is in another version, and it is something like *.so.x, where x is a number of the version. If this is the case, you need to create a symbolic link from a *.so to *.so.x:
$sudo ln -s libtensorflow_framework.so.x libtensorflow_framework.so You then, have to compile every tf_op again, and try checking the ldd again.
- If it is in the list, but it checks as not found, you just need to add its path to the ld library path: $export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to were libtensorflow_framework.so is) $sudo ldconfig
That's it.
you can check again with ldd to see if the library is there in the list and being properly found. and run your train.py
I was able to solve it in an anaconda python 3.7, with g++ 7.5, tensorflow 2.1
my makefile as follows:
Have you check the if the libtensorflow is in the folder \home\user\anaconda2\envs\planenet\lib\python2.7(...)\tensorflow ? that's where it should be and usually it is *.so.x where x is another number.
What tensorflow version have you installed?
This is libtensorflow's path :
------------------ 原始邮件 ------------------ 发件人: "pauloffsf"<[email protected]>; 发送时间: 2020年6月20日(星期六) 晚上6:43 收件人: "charlesq34/pointnet2"<[email protected]>; 抄送: "Dandelion's Fled"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [charlesq34/pointnet2] Error when training (#111)
Have you check the if the libtensorflow is in the folder \home\user\anaconda2\envs\planenet\lib\python2.7(...)\tensorflow ? that's where it should be and usually it is *.so.x where x is another number.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.