flownet3d_pytorch icon indicating copy to clipboard operation
flownet3d_pytorch copied to clipboard

CUDA kernel failed : invalid device function

Open VictorZuanazzi opened this issue 5 years ago • 6 comments

Hey,

Really good job with the implementation!

I am running into the following error when I try to use your implementation of the FlowNet3D:

CUDA kernel failed : invalid device function

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It does not show the line the error is raised.

Which CUDA version are you using? I am using 10.1

VictorZuanazzi avatar Dec 09 '19 17:12 VictorZuanazzi

I went followed the forward pass step by step in debug mode, I found the that problem comes from this line:

https://github.com/hyangwinter/flownet3d_pytorch/blob/051ead641270e76008119ecc4937f898b9118897/lib/pointnet2_utils.py#L28

Any ideas of what could cause the problem?

VictorZuanazzi avatar Dec 09 '19 21:12 VictorZuanazzi

Hi, My CUDA version is 10.0. The problem looks like the extension is not compiled properly. I found several pages for you that might cause this error: https://stackoverflow.com/questions/28451859/cuda-invalid-device-function-how-to-know-architecture-code https://stackoverflow.com/questions/17599189/what-is-the-purpose-of-using-multiple-arch-flags-in-nvidias-nvcc-compiler

But in fact, BuildExtension in setuptools will determine the arch flags according to the model of the GPU. The GPU I use is RTX 2080 Ti, so the output when I compile is as follows:

/usr/local/cuda/bin/nvcc -I/root/anaconda3/lib/python3.6/site-packages/torch/include -I/root/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/lib/python3.6/site-packages/torch/include/TH -I/root/anaconda3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/anaconda3/include/python3.6m -c src/sampling_gpu.cu -o build/temp.linux-x86_64-3.6/src/sampling_gpu.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=pointnet2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++11

You can check whether -gencode=arch matches your GPU or downgrade CUDA version to 10.0. Hope these are helpful to you, beat wishes 😃

hyangwinter avatar Dec 10 '19 08:12 hyangwinter

Thanks =)

I am not very skilled myself in CUDA, how can I check that? I did not find anything online that I could use.

VictorZuanazzi avatar Dec 11 '19 15:12 VictorZuanazzi

This page shows the relationship between GPUs and their compute capability. For example, my RTX 2080 Ti compute capability is 7.5, so the parameter of arch and code is compute_75 and sm_75 respectively.

hyangwinter avatar Dec 12 '19 12:12 hyangwinter

Hey, Really good job with the implementation! I am running into the following error when I try to use your implementation of the FlowNet3D: CUDA kernel failed : invalid device function

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It does not show the line the error is raised. Which CUDA version are you using? I am using 10.1

This page shows the relationship between GPUs and their compute capability. For example, my RTX 2080 Ti compute capability is 7.5, so the parameter of arch and code is compute_75 and sm_75 respectively.

I encountered the same problem. My GPU is RTX 2080 Ti and the CUDA version is 10.1. When I run the install command in setup.py and try to start training, the following error appears: CUDA kernel failed : invalid device function Segmentation fault (core dumped)

Zhang-z625 avatar Jan 03 '20 03:01 Zhang-z625

It seems that mismatched NVCC vs CUDA Runtime version is the root cause. Please refer to this issue for more information: https://github.com/facebookresearch/detectron2/issues/62#issuecomment-542447753

hyangwinter avatar Sep 02 '20 08:09 hyangwinter