ssd-pytorch 初始模型过拟合问题

我使用ssd_weights.pth进行预测，做测试工业头盔的识别，但长头发的人也被识别为了头盔，且得分为1.0；初始模型过拟合了吗？因为我在这个模型基础上训练了1600张工业头盔，仍然无法解决长头发黑色被识别为头盔的问题。是否需要使用原始模型vgg16-397923af.pth并改造为初始模型。

Dec 19 '24 03:12 zhjygit

直接使用提供的ssd_weights.pth进行预测会报错，：python predict.py Traceback (most recent call last): File "predict.py", line 13, in ssd = SSD() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 108, in init self.generate() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 119, in generate self.net.load_state_dict(torch.load(self.model_path, map_location=device)) File "D:\ProgramData\anaconda3\envs\ssd-pytorch\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD300: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]).

(ssd-pytorch) E:\ssd-pytorch\ssd-pytorch>python predict.py Traceback (most recent call last): File "predict.py", line 13, in ssd = SSD() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 108, in init self.generate() File "E:\ssd-pytorch\ssd-pytorch\ssd.py", line 119, in generate self.net.load_state_dict(torch.load(self.model_path, map_location=device)) File "D:\ProgramData\anaconda3\envs\ssd-pytorch\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD300: size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 512, 3, 3]). size mismatch for conf.0.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 1024, 3, 3]). size mismatch for conf.1.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 512, 3, 3]). size mismatch for conf.2.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.3.weight: copying a param with shape torch.Size([126, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([138, 256, 3, 3]). size mismatch for conf.3.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([138]). size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([92, 256, 3, 3]). size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([92]). 此时应该如何进行模型匹配和修改呢？

Dec 19 '24 03:12 zhjygit

应该是voc_classes.txt不对。我使用了项目里面推荐的权重ssd_weights.pth，大小为102690KB，项目对应的voc_classes.txt内容为：aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor。我想要的目标为helmet，就进行了训练，在 tvmonitor后面添加了helmet，数据集是1600张左右的工人头盔照片，手动清洗过，基本都是戴安全帽的照片。训练时，速率只有15it/s左右，我的显卡是4060ti，不知道为什么这么慢，大概训练了20分钟后，中断了训练，使用了一个总loss大约2左右的训练模型进行测试，头盔正常识别，但黑色头发的人像仍然被识别为头盔，不知道什么原因。

Dec 19 '24 09:12 zhjygit