ssd.pytorch icon indicating copy to clipboard operation
ssd.pytorch copied to clipboard

RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

Open 17764591637 opened this issue 6 years ago • 54 comments

rps@rps:~/桌面/ssd.pytorch$ python3 train.py /home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True) /home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.weight,self.gamma) Loading base network... Initializing weights... train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param) Loading the dataset... Training SSD on: VOC0712 Using the specified args: Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/home/rps/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005) train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets] Traceback (most recent call last): File "train.py", line 255, in train() File "train.py", line 178, in train loss_l, loss_c = criterion(out, targets) File "/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py", line 97, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

anyone helps,please...

17764591637 avatar Jun 04 '18 11:06 17764591637

I have the same error.Using Pytorch0.4+python3.5.

isaactalx avatar Jun 04 '18 12:06 isaactalx

python3.5 and pytorch 0.3.0 no problem

bobo0810 avatar Jun 07 '18 03:06 bobo0810

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : "File "/home/.../ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other'" The type of tensor is not match, how can I fix it ?

xscjun avatar Jun 07 '18 10:06 xscjun

@xscjun change line: N = num_pos.data.sum() to:
N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() this should work

slomrafgrav avatar Jun 08 '18 11:06 slomrafgrav

Anyone has solved this problem? help me tks.

gtwell avatar Jul 25 '18 08:07 gtwell

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : "File "/home/.../ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other'" The type of tensor is not match, how can I fix it ?

The “pos” -> torch.Size([32, 8732]) The “loss_c ” ->torch.Size([279424, 1]) when I add one line as :

        loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #add line 
        loss_c[pos] = 0  # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)

Then it worked.

Lin-Zhipeng avatar Sep 27 '18 03:09 Lin-Zhipeng

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : "File "/home/.../ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other'" The type of tensor is not match, how can I fix it ?

i have the same error, and how did you solve it finally?

zxt-triumph avatar Nov 01 '18 13:11 zxt-triumph

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : "File "/home/.../ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other'" The type of tensor is not match, how can I fix it ?

i have the same error, so how could you figure it out finally?

zxt-triumph avatar Nov 01 '18 13:11 zxt-triumph

What file should be updated?

matthewarthur avatar Nov 07 '18 22:11 matthewarthur

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : "File "/home/.../ssd.pytorch/layers/modules/multibox_loss.py", line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 'other'" The type of tensor is not match, how can I fix it ?

change the data type of N to FloatTensor.

queryor avatar Nov 12 '18 01:11 queryor

What file should be updated?

You may try to update your file /home/.../ssd.pytorch/layers/modules/multibox_loss.py, and add one line as @LZP4GitHub said above.

usherbob avatar Nov 12 '18 04:11 usherbob

@usherbob python3.6+pytorch0.4.1, I added "loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #add line", but I have another issue. RuntimeError: copy_if failed to synchronize: device-side assert triggered

subicWang avatar Nov 13 '18 10:11 subicWang

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

subicWang avatar Nov 14 '18 02:11 subicWang

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

I changed like this, but there was a RuntimeError still: RuntimeError: device-side assert triggered How can I fix it ? Looking forward to your reply.Thank you!

CJJ-717 avatar Dec 14 '18 02:12 CJJ-717

by changing the order of line 97 and 98 it throws a new error for me

Traceback (most recent call last):
  File "train.py", line 254, in <module>
    train()
  File "train.py", line 182, in train
    loc_loss += loss_l.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

any suggestions?

PS: I tried as well converting the loss to double as mentioned above and still the same error!


### solved apparently 'loss_l.data[0]' should be replaced with 'loss_l.item()' instead this replacement applies on every loss_x.data[0] in the file!

wisdomk avatar Mar 22 '19 13:03 wisdomk

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

很棒,但是有个小bug,是line 114,不是line 144

leaf918 avatar Mar 26 '19 02:03 leaf918

If your Python torch version is '0.4.1' ,you can change follow step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() But if your python torch version is 1.0.1,that change is no useful.

TianSong1991 avatar Mar 26 '19 03:03 TianSong1991

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

TianSong1991 avatar Mar 26 '19 03:03 TianSong1991

loss is increasing as shown below

timer: 2.2050 sec. iter 0 || Loss: 153.4730 || timer: 1.8316 sec. iter 10 || Loss: 48.9679 || timer: 1.8920 sec. iter 20 || Loss: 191.8098 || timer: 2.0969 sec. iter 30 || Loss: 110.8081 || timer: 1.8849 sec. iter 40 || Loss: 106.9749 || timer: 1.9373 sec. iter 50 || Loss: 134.3674 || timer: 2.0012 sec. . .

help me to solve the issue.

charan1561 avatar Mar 26 '19 11:03 charan1561

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

thanks,that is usefully for me,but ,step3 is:line 183,184,188,191, 5 item ,loss_x.data[0] >> loss_x.data or loss.data[0] >> loss.data

litianciucas avatar Mar 31 '19 14:03 litianciucas

would be loss_x.data[0] >> loss_x.item() better?

blueardour avatar Apr 04 '19 22:04 blueardour

@TianSong1991 Thanks a lot.Pytorch 1.0+Python 3.5 success!

espectre avatar Apr 16 '19 07:04 espectre

PS: I tried as well converting the loss to double as mentioned above and still the same error!

much obligated!

zz10001 avatar May 09 '19 02:05 zz10001

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

but loss is nan

mk123qwe avatar May 20 '19 09:05 mk123qwe

@TianSong1991 Thanks a lot.Pytorch 1.0+Python 3.5 success! but loss is nan

mk123qwe avatar May 20 '19 09:05 mk123qwe

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

but loss is nan

I have the same problem. Why loss is nan?

xafarranxera avatar May 20 '19 13:05 xafarranxera

If your Python torch version is '0.4.1' ,you can change follow step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() But if your python torch version is 1.0.1,that change is no useful.

Hi , why don`t the loss_l divide by N?

OberstWB avatar May 24 '19 07:05 OberstWB

Same problem here.

I used the @ LZP4GitHub solution and it is working fine, but i don't understand what is the difference between its solution and https://github.com/amdegroot/ssd.pytorch/pull/322 this one.

SalahAdDin avatar Jul 11 '19 14:07 SalahAdDin

I have the same error.Using Pytorch1.1+python3.6

loss_c[pos] = 0 # filter out pos boxes for now IndexError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

mm1327 avatar Jul 30 '19 08:07 mm1327

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()
and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

ashleylid avatar Aug 29 '19 17:08 ashleylid