ssd-pytorch icon indicating copy to clipboard operation
ssd-pytorch copied to clipboard

初始模型过拟合问题

Open zhjygit opened this issue 1 year ago • 2 comments

我使用ssd_weights.pth进行预测,做测试工业头盔的识别,但长头发的人也被识别为了头盔,且得分为1.0; 初始模型过拟合了吗?因为我在这个模型基础上训练了1600张工业头盔,仍然无法解决长头发黑色被识别为头盔的问题。 是否需要使用原始模型vgg16-397923af.pth并改造为初始模型。

zhjygit avatar Dec 19 '24 03:12 zhjygit

直接使用提供的ssd_weights.pth进行预测会报错,:python predict.py Traceback (most recent call last): File "predict.py", line 13, in ssd = SSD() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 108, in init self.generate() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 119, in generate self.net.load_state_dict(torch.load(self.model_path, map_location=device)) File "D:\ProgramData\anaconda3\envs\ssd-pytorch\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD300: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]).

(ssd-pytorch) E:\ssd-pytorch\ssd-pytorch>python predict.py Traceback (most recent call last): File "predict.py", line 13, in ssd = SSD() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 108, in init self.generate() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 119, in generate self.net.load_state_dict(torch.load(self.model_path, map_location=device)) File "D:\ProgramData\anaconda3\envs\ssd-pytorch\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD300: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). 此时应该如何进行模型匹配和修改呢?

zhjygit avatar Dec 19 '24 03:12 zhjygit

应该是voc_classes.txt不对。我使用了项目里面推荐的权重ssd_weights.pth,大小为102690KB,项目对应的voc_classes.txt内容为:aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor。 我想要的目标为helmet,就进行了训练,在 tvmonitor后面添加了helmet,数据集是1600张左右的工人头盔照片,手动清洗过,基本都是戴安全帽的照片。训练时,速率只有15it/s左右,我的显卡是4060ti,不知道为什么这么慢,大概训练了20分钟后,中断了训练,使用了一个总loss大约2左右的训练模型进行测试,头盔正常识别,但黑色头发的人像仍然被识别为头盔,不知道什么原因。

zhjygit avatar Dec 19 '24 09:12 zhjygit