ssd.pytorch icon indicating copy to clipboard operation
ssd.pytorch copied to clipboard

device-side assert triggered

Open JingXiaolun opened this issue 6 years ago • 12 comments

When I run the training.py,an annoying issue happened as follows. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [353,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [193,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [197,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [205,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [34,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [38,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [40,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [41,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [46,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [45,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [49,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [155,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [349,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [133,0,0], thread: [439,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. Traceback (most recent call last): iter 0 || Loss: 31.0600 || File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 262, in train() File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 183, in train loss_l, loss_c = criterion(out, targets) File "/home/csu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: device-side assert triggered @Cadene ,I have tried many solutions,but the problem still can't be solved.I hope someone can help me solve the problem,thanks!

JingXiaolun avatar Sep 04 '18 01:09 JingXiaolun

It seems that something is wrong with the indices of the classes: Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.

You should debug with import ipdb;ipdb.set_trace() in ssd.pytorch/layers/modules/multibox_loss.py before line 115 to see which index is causing the error.

Le lun. 3 sept. 2018 à 18:33, 1csu [email protected] a écrit :

When I run the training.py,an annoying issue happened as follows. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [353,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [193,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [197,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [205,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [34,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [38,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [40,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [41,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [46,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [45,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [49,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [155,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [349,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [133,0,0], thread: [439,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. Traceback (most recent call last): iter 0 || Loss: 31.0600 || File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 262, in train() File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 183, in train loss_l, loss_c = criterion(out, targets) File "/home/csu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: device-side assert triggered @Cadene https://github.com/Cadene ,I have tried many solutions,but the problem still can't be solved.I hope someone can help me solve the problem,thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/amdegroot/ssd.pytorch/issues/231, or mute the thread https://github.com/notifications/unsubscribe-auth/AEdvLg4l1v4T0rTpYoffG7Wn-kxOiYA8ks5uXdhHgaJpZM4WYKoS .

-- Remi Cadene PhD Student LIP6 - UPMC Sorbonne Universities http://remicadene.com 0033635504788

Cadene avatar Sep 04 '18 03:09 Cadene

@Cadene ,I have started the debug process.To my surprise,I found in the ssd.py,such pause happened as follows:

apply vgg up to conv4_3 relu

    for k in range(23):
        x = self.vgg[k](x)
        if isnan(x[0].sum())== True :
            print('k=', k)
            pdb.set_trace()

I found in the tensor x,some elements are nan or -inf,which made me very confused.Can you tell me what caused this problem and how can I solve this issue ?Thanks a lot.

JingXiaolun avatar Sep 05 '18 09:09 JingXiaolun

hi@1csu,I met the same problem as U, 'device-side assert triggered', have U deal with it ? want to discuss with U, waiting for your reply, thank U so much!

yonger001 avatar Sep 28 '18 02:09 yonger001

@1csu I have solved this problem by modifying classes label.MY label is bkg = 0,person =1,this is a error same with you.I have found there was a match function in box_utils.py where this func have a "conf = labels[best_truth_idx] + 1 " ,it had my person label = 2. When i made person =0,it works.

Weacera avatar Oct 25 '18 02:10 Weacera

I used VOC dataset to run the train.py and met the same problem as U.

sheakyu avatar Nov 05 '18 07:11 sheakyu

I have encountered the same error. And then I find my dataset label is not started from 0. After change the label ,it is working.

Feywell avatar Nov 13 '18 09:11 Feywell

@Weacera I want to know where you changed? which file?Thank you very much for your reply.

CJJ-717 avatar Dec 14 '18 02:12 CJJ-717

@Weacera I want to know where you changed? which file?Thank you very much for your reply.

change the match function in box_utils.py,if the background index label 0, then conf = labels[best_truth_idx]; else if not background index, the 'aeroplane' label 0, conf = labels[best_truth_idx] +1;

Jon-drugstore avatar Dec 20 '18 05:12 Jon-drugstore

I have the same problem on ubuntu, but the same code run well on WIN10!! I don't know why.

MarterLee avatar Mar 30 '19 13:03 MarterLee

I have the same problem on ubuntu, but the same code run well on WIN10!! I don't know why.

Me too

anheidelonghu avatar Jun 14 '19 14:06 anheidelonghu

make sure you have changed number of classes in config.py

MaddyThakker avatar Mar 07 '20 20:03 MaddyThakker

+1 doesn't help, I had to add background into classes to solve this cross entropy issue.

gitE0Z9 avatar Aug 11 '21 02:08 gitE0Z9