flownet3d_pytorch
flownet3d_pytorch copied to clipboard
CUDA kernel failed : invalid device function
Hey,
Really good job with the implementation!
I am running into the following error when I try to use your implementation of the FlowNet3D:
CUDA kernel failed : invalid device function
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
It does not show the line the error is raised.
Which CUDA version are you using? I am using 10.1
I went followed the forward pass step by step in debug mode, I found the that problem comes from this line:
https://github.com/hyangwinter/flownet3d_pytorch/blob/051ead641270e76008119ecc4937f898b9118897/lib/pointnet2_utils.py#L28
Any ideas of what could cause the problem?
Hi, My CUDA version is 10.0. The problem looks like the extension is not compiled properly. I found several pages for you that might cause this error: https://stackoverflow.com/questions/28451859/cuda-invalid-device-function-how-to-know-architecture-code https://stackoverflow.com/questions/17599189/what-is-the-purpose-of-using-multiple-arch-flags-in-nvidias-nvcc-compiler
But in fact, BuildExtension
in setuptools
will determine the arch flags according to the model of the GPU. The GPU I use is RTX 2080 Ti, so the output when I compile is as follows:
/usr/local/cuda/bin/nvcc -I/root/anaconda3/lib/python3.6/site-packages/torch/include -I/root/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/lib/python3.6/site-packages/torch/include/TH -I/root/anaconda3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/anaconda3/include/python3.6m -c src/sampling_gpu.cu -o build/temp.linux-x86_64-3.6/src/sampling_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=pointnet2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11
You can check whether -gencode=arch
matches your GPU or downgrade CUDA version to 10.0.
Hope these are helpful to you, beat wishes 😃
Thanks =)
I am not very skilled myself in CUDA, how can I check that? I did not find anything online that I could use.
This page shows the relationship between GPUs and their compute capability. For example, my RTX 2080 Ti compute capability is 7.5, so the parameter of arch
and code
is compute_75
and sm_75
respectively.
Hey, Really good job with the implementation! I am running into the following error when I try to use your implementation of the FlowNet3D: CUDA kernel failed : invalid device function
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
It does not show the line the error is raised. Which CUDA version are you using? I am using 10.1
This page shows the relationship between GPUs and their compute capability. For example, my RTX 2080 Ti compute capability is 7.5, so the parameter of arch and code is compute_75 and sm_75 respectively.
I encountered the same problem. My GPU is RTX 2080 Ti and the CUDA version is 10.1. When I run the install command in setup.py and try to start training, the following error appears: CUDA kernel failed : invalid device function Segmentation fault (core dumped)
It seems that mismatched NVCC vs CUDA Runtime version is the root cause. Please refer to this issue for more information: https://github.com/facebookresearch/detectron2/issues/62#issuecomment-542447753