ssd-pytorch icon indicating copy to clipboard operation
ssd-pytorch copied to clipboard

当我改为6分类时报错

Open xie199389 opened this issue 4 years ago • 13 comments

Traceback (most recent call last): File "D:/python project/study/CNN/ssd-pytorch-flower/train.py", line 37, in model.load_state_dict(pretrained_dict) File "D:\工作\python3.6\lib\site-packages\torch\nn\modules\module.py", line 830, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]). 当我改为6分类问题时报错如上,已修改config.py,voc_annotation.py,voc_classes.txt中的class类别,但仍然报错,请问是ssd_weights.pth的问题吗,应如何解决?

xie199389 avatar Mar 26 '20 03:03 xie199389

已经修改,重新复制一下train文件的内容

bubbliiiing avatar Mar 26 '20 03:03 bubbliiiing

已经修改,重新复制一下train文件的内容

老哥,我看你把model.load_state_dict(pretrained_dict)改为model.load_state_dict(model_dict),但似乎还是报这个错,除非把config.py的num_classes改回21就不报错了,另外,大神能留个联系方式吗

xie199389 avatar Mar 26 '20 04:03 xie199389

有一句忘了复制了 pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) == np.shape(v)}

bubbliiiing avatar Mar 26 '20 05:03 bubbliiiing

有一句忘了复制了 pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) == np.shape(v)}

谢谢,现在train.py的训练没有问题,但是predict.py预测仍然报错如下: Using TensorFlow backend. 2020-03-26 14:51:12.004041: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found 2020-03-26 14:51:12.004170: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "D:/python project/study/CNN/ssd-pytorch-flower/predict.py", line 5, in ssd = SSD() File "D:\python project\study\CNN\ssd-pytorch-flower\ssd.py", line 37, in init self.generate() File "D:\python project\study\CNN\ssd-pytorch-flower\ssd.py", line 62, in generate model.load_state_dict(torch.load(self.model_path,map_location='cpu')) File "D:\工作\python3.6\lib\site-packages\torch\nn\modules\module.py", line 830, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([42, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([42]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([28]).

xie199389 avatar Mar 26 '20 07:03 xie199389

你这里看起来还是模型不匹配诶,你的训练的模型是不是num_classes忘了改,因为你这个是21类的shape

bubbliiiing avatar Mar 27 '20 06:03 bubbliiiing

我感觉如果想去训练自己的模型需要把 model_dict.update(pretrained_dict) 这一句注释掉

zkm98 avatar Apr 01 '20 16:04 zkm98

我注释过之后我报错这个情况 Epoch:1/50 iter:0/3 || Loc_Loss: 3.2771 || Conf_Loss: 7.9131 || Traceback (most recent call last): File "马赛克/code/train.py", line 82, in loss_l, loss_c = criterion(out, targets) File "马赛克\torch\nn\modules\module.py", line 532, in call result = self.forward(*input, **kwargs) File "马赛克code\nets\ssd_training.py", line 80, in forward loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1)) RuntimeError: Invalid index in gather at C:\w\1\s\tmp_conda_3.7_100118\conda\conda-bld\pytorch_1579082551706\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:657

zkm98 avatar Apr 01 '20 16:04 zkm98

我觉得是你的设置出了问题,你自己看错误也知道,你把一个6类的模型导入到20类里面

bubbliiiing avatar Apr 01 '20 16:04 bubbliiiing

我这里的conf函数确实是更改了,确实是我设置的的num_Classes,而且我也打印了看看

zkm98 avatar Apr 01 '20 17:04 zkm98

我找到原因了就是在你去计算conf_t的时候没有很好的计算好

zkm98 avatar Apr 01 '20 17:04 zkm98

好吧

bubbliiiing avatar Apr 02 '20 03:04 bubbliiiing

有些奇怪你這裏match更新conf_t的時候出現一些數據大於num_classes的情況

zkm98 avatar Apr 02 '20 04:04 zkm98

補充一下,上面有一個問題是,應該在類裏面設置背景。不然出現一些問題

zkm98 avatar Apr 06 '20 06:04 zkm98