ssd.pytorch
ssd.pytorch copied to clipboard
device-side assert triggered
When I run the training.py,an annoying issue happened as follows.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [353,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [193,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [197,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [205,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [34,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [38,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [40,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [41,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [46,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [45,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [49,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [155,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [349,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [133,0,0], thread: [439,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim]
failed.
Traceback (most recent call last):
iter 0 || Loss: 31.0600 || File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 262, in
It seems that something is wrong with the indices of the classes: Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
You should debug with import ipdb;ipdb.set_trace()
in
ssd.pytorch/layers/modules/multibox_loss.py
before line 115 to see which index is causing the error.
Le lun. 3 sept. 2018 à 18:33, 1csu [email protected] a écrit :
When I run the training.py,an annoying issue happened as follows. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [353,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [193,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [197,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [205,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [34,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [38,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [40,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [41,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [46,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [45,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [49,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [134,0,0], thread: [155,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [128,0,0], thread: [349,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [133,0,0], thread: [439,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. Traceback (most recent call last): iter 0 || Loss: 31.0600 || File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 262, in train() File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/train.py", line 183, in train loss_l, loss_c = criterion(out, targets) File "/home/csu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/media/csu/新加卷/AI Competition/Baidu Competition/intermediary_contest/ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: device-side assert triggered @Cadene https://github.com/Cadene ,I have tried many solutions,but the problem still can't be solved.I hope someone can help me solve the problem,thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/amdegroot/ssd.pytorch/issues/231, or mute the thread https://github.com/notifications/unsubscribe-auth/AEdvLg4l1v4T0rTpYoffG7Wn-kxOiYA8ks5uXdhHgaJpZM4WYKoS .
-- Remi Cadene PhD Student LIP6 - UPMC Sorbonne Universities http://remicadene.com 0033635504788
@Cadene ,I have started the debug process.To my surprise,I found in the ssd.py,such pause happened as follows:
apply vgg up to conv4_3 relu
for k in range(23):
x = self.vgg[k](x)
if isnan(x[0].sum())== True :
print('k=', k)
pdb.set_trace()
I found in the tensor x,some elements are nan or -inf,which made me very confused.Can you tell me what caused this problem and how can I solve this issue ?Thanks a lot.
hi@1csu,I met the same problem as U, 'device-side assert triggered', have U deal with it ? want to discuss with U, waiting for your reply, thank U so much!
@1csu I have solved this problem by modifying classes label.MY label is bkg = 0,person =1,this is a error same with you.I have found there was a match function in box_utils.py where this func have a "conf = labels[best_truth_idx] + 1 " ,it had my person label = 2. When i made person =0,it works.
I used VOC dataset to run the train.py and met the same problem as U.
I have encountered the same error. And then I find my dataset label is not started from 0. After change the label ,it is working.
@Weacera I want to know where you changed? which file?Thank you very much for your reply.
@Weacera I want to know where you changed? which file?Thank you very much for your reply.
change the match function in box_utils.py,if the background index label 0, then conf = labels[best_truth_idx]; else if not background index, the 'aeroplane' label 0, conf = labels[best_truth_idx] +1;
I have the same problem on ubuntu, but the same code run well on WIN10!! I don't know why.
I have the same problem on ubuntu, but the same code run well on WIN10!! I don't know why.
Me too
make sure you have changed number of classes in config.py
+1 doesn't help, I had to add background into classes to solve this cross entropy issue.