Deformable-Convolution-V2-PyTorch
Deformable-Convolution-V2-PyTorch copied to clipboard
error: Segmentation fault (core dumped), pytorch version: 1.0.0 cuda:8.0
I get the same problem. ( torch1.0.0 cuda9.0)
#7 Please try this solution.
@xvjiarui thanks a lot for your help! I'll work on it
@xvjiarui Hi, another problem occured when running the function check_gradient_dconv()
in test.py
. (others function for checking in test.py
seems running correctly )
error in deformable_col2im_cuda: too many resources requested for launch Traceback (most recent call last): File "test.py", line 624, in
check_gradient_dconv() File "test.py", line 400, in check_gradient_dconv eps=1e-3, atol=1e-3, rtol=1e-2, raise_exception=True)) File "/home/gzh/SoftWare/tf1.10/anaconda2/envs/python36/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 205, in gradcheck 'numerical:%s\nanalytical:%s\n' % (i, j, n, a)) File "/home/gzh/SoftWare/tf1.10/anaconda2/envs/python36/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 185, in fail_test raise RuntimeError(msg) RuntimeError: Jacobian mismatch for output 0 with respect to input 0, numerical:tensor([[ 0.0000, -0.0243, -0.1477, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, -0.0236, -0.0276, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [ 0.0000, 0.0000, 0.0000, ..., 0.0000, -0.2121, 0.0206], [ 0.0000, 0.0000, 0.0000, ..., 0.0058, -0.2543, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., 0.1695, 0.0015, 0.0480]], dtype=torch.float64) analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=torch.float64)
The environment is
- Titan Black (6G memory)
- cuda8.0 cudnn6.0
- torch 1.0
- python 3.6
Any help?
I believe it is caused by different hardware. Try to change https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/1b5851abd404dc71f02ae0110af3540b6877309e/src/cuda/deform_im2col_cuda.cuh#L17 and https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/1b5851abd404dc71f02ae0110af3540b6877309e/src/cuda/modulated_deform_im2col_cuda.cuh#L17 to some smaller number like 512 or 256. This should help.
@xvjiarui It's okay! Thanks a lot~
Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?
Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?
Segmentation fault could be solve by this https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/issues/7.
CUDA_NUM_THREADS
is responsible for too many resources requested for launch
.
@Simpatech-app have you rebuiled the code?
@xvjiarui @gzhcv I have gcc version 5.4.0 and according to issue#7 the gcc >=4.9 should work well. However, I have still the same problem.
@Simpatech-app have you rebuiled the code?
Do you mean by running again sh make.sh???? if so, I already did it and still get this error. Any idea how to fix it? After making the project, should do any thing else?
Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?
Segmentation fault could be solve by this #7.
CUDA_NUM_THREADS
is responsible fortoo many resources requested for launch
.
I have the same issue too many resources requested for launch
on v100 sxm3
Tips: You should change the codes before make.
so, first git clone the repo,
then make sure your gcc version >=4.9,
and change the code as mentioned @xvjiarui ,
last bash ./make.sh
.
If Backward is not reentrant
after python test
, refer to issue #16
I have the same issue with rtx2070 :( I make sure my gcc version is 5.4 and i changed CUDA_NUM_THREADS = 512, 256, 1
@cjnjuwhy Thank you very much,I solved the problem following your tips!
@Simpatech-app have you rebuiled the code?
Do you mean by running again sh make.sh???? if so, I already did it and still get this error. Any idea how to fix it? After making the project, should do any thing else?
Gentle , the make.sh
will create a Directory named bulid
in your project , after you change 1024 to 256 with @xvjiarui , you need to delete this Directory bulid
, and run make.sh
, then it will work.
And after that , you will meet another exception named Backward is not reentrant
, I just delete those check_function
Hello,world
Hi, I get the same problem but the #7 issue is missing now. Can anyone please explain again how to solve the "Segmentation fault (core dumped)" error?