Deformable-Convolution-V2-PyTorch icon indicating copy to clipboard operation
Deformable-Convolution-V2-PyTorch copied to clipboard

error: Segmentation fault (core dumped), pytorch version: 1.0.0 cuda:8.0

Open hfutzzw opened this issue 5 years ago • 17 comments

hfutzzw avatar Mar 13 '19 13:03 hfutzzw

I get the same problem. ( torch1.0.0 cuda9.0)

gzhcv avatar Mar 15 '19 08:03 gzhcv

#7 Please try this solution.

xvjiarui avatar Mar 15 '19 09:03 xvjiarui

@xvjiarui thanks a lot for your help! I'll work on it

gzhcv avatar Mar 15 '19 09:03 gzhcv

@xvjiarui Hi, another problem occured when running the function check_gradient_dconv() in test.py. (others function for checking in test.py seems running correctly )

error in deformable_col2im_cuda: too many resources requested for launch Traceback (most recent call last): File "test.py", line 624, in check_gradient_dconv() File "test.py", line 400, in check_gradient_dconv eps=1e-3, atol=1e-3, rtol=1e-2, raise_exception=True)) File "/home/gzh/SoftWare/tf1.10/anaconda2/envs/python36/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 205, in gradcheck 'numerical:%s\nanalytical:%s\n' % (i, j, n, a)) File "/home/gzh/SoftWare/tf1.10/anaconda2/envs/python36/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 185, in fail_test raise RuntimeError(msg) RuntimeError: Jacobian mismatch for output 0 with respect to input 0, numerical:tensor([[ 0.0000, -0.0243, -0.1477, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, -0.0236, -0.0276, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [ 0.0000, 0.0000, 0.0000, ..., 0.0000, -0.2121, 0.0206], [ 0.0000, 0.0000, 0.0000, ..., 0.0058, -0.2543, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., 0.1695, 0.0015, 0.0480]], dtype=torch.float64) analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=torch.float64)

The environment is

  • Titan Black (6G memory)
  • cuda8.0 cudnn6.0
  • torch 1.0
  • python 3.6

Any help?

gzhcv avatar Mar 16 '19 05:03 gzhcv

I believe it is caused by different hardware. Try to change https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/1b5851abd404dc71f02ae0110af3540b6877309e/src/cuda/deform_im2col_cuda.cuh#L17 and https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/blob/1b5851abd404dc71f02ae0110af3540b6877309e/src/cuda/modulated_deform_im2col_cuda.cuh#L17 to some smaller number like 512 or 256. This should help.

xvjiarui avatar Mar 16 '19 11:03 xvjiarui

@xvjiarui It's okay! Thanks a lot~

gzhcv avatar Mar 16 '19 11:03 gzhcv

Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?

Simpatech-app avatar Apr 02 '19 07:04 Simpatech-app

Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?

Segmentation fault could be solve by this https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/issues/7. CUDA_NUM_THREADS is responsible for too many resources requested for launch.

xvjiarui avatar Apr 02 '19 07:04 xvjiarui

@Simpatech-app have you rebuiled the code?

gzhcv avatar Apr 02 '19 07:04 gzhcv

@xvjiarui @gzhcv I have gcc version 5.4.0 and according to issue#7 the gcc >=4.9 should work well. However, I have still the same problem.

Simpatech-app avatar Apr 02 '19 08:04 Simpatech-app

@Simpatech-app have you rebuiled the code?

Do you mean by running again sh make.sh???? if so, I already did it and still get this error. Any idea how to fix it? After making the project, should do any thing else?

Simpatech-app avatar Apr 09 '19 15:04 Simpatech-app

Hello I have the same issue , I tried with 512, 256, ... even with 1 and I am receiving the same error. I have GeForce GTX1080. Any idea why I have still this problem?

Segmentation fault could be solve by this #7. CUDA_NUM_THREADS is responsible for too many resources requested for launch.

I have the same issue too many resources requested for launch on v100 sxm3

ae86zhizhi avatar Apr 17 '19 08:04 ae86zhizhi

Tips: You should change the codes before make. so, first git clone the repo, then make sure your gcc version >=4.9, and change the code as mentioned @xvjiarui , last bash ./make.sh. If Backward is not reentrant after python test, refer to issue #16

cjnjuwhy avatar May 02 '19 04:05 cjnjuwhy

I have the same issue with rtx2070 :( I make sure my gcc version is 5.4 and i changed CUDA_NUM_THREADS = 512, 256, 1

dtn97 avatar May 06 '19 08:05 dtn97

@cjnjuwhy Thank you very much,I solved the problem following your tips!

aaronpetok avatar May 20 '19 04:05 aaronpetok

@Simpatech-app have you rebuiled the code?

Do you mean by running again sh make.sh???? if so, I already did it and still get this error. Any idea how to fix it? After making the project, should do any thing else?

Gentle , the make.sh will create a Directory named bulid in your project , after you change 1024 to 256 with @xvjiarui , you need to delete this Directory bulid , and run make.sh , then it will work.

And after that , you will meet another exception named Backward is not reentrant, I just delete those check_function

Hello,world

heartInsert avatar Jul 23 '19 10:07 heartInsert

Hi, I get the same problem but the #7 issue is missing now. Can anyone please explain again how to solve the "Segmentation fault (core dumped)" error?

RuijieJ avatar Aug 02 '19 06:08 RuijieJ