pytorch-ssd
pytorch-ssd copied to clipboard
evaluation of ssd_mobilenetv1_coco
I'm trying to import the ssd_mobilenetv1_coco_2018, converting it from Tensotflow (.pb) to pytorch (.pth). After the conversion, I wanted to evaluate it with the webcam input but I noticed that there is a mismatch between some layer settings in the SSD class and the pretrained model corresponding to the last EXTRA conv layers/classification_headers/regression_headers.
I had to edit your code in the create_mobilenetv1_ssd like this
extras = ModuleList([
Sequential(
Conv2d(in_channels=1024, out_channels=256, kernel_size=1),
ReLU(),
Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, padding=1),
ReLU()
),
Sequential(
Conv2d(in_channels=512, out_channels=128, kernel_size=1),
ReLU(),
Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
ReLU()
),
Sequential(
Conv2d(in_channels=256, out_channels=128, kernel_size=1),
ReLU(),
Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
ReLU()
),
Sequential(
Conv2d(in_channels=256, out_channels=64, kernel_size=1),
ReLU(),
Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1),
ReLU()
)
])
regression_headers = ModuleList([
Conv2d(in_channels=512, out_channels=3 * 4, kernel_size=1, padding=1),
Conv2d(in_channels=1024, out_channels=6 * 4, kernel_size=1, padding=1),
Conv2d(in_channels=512, out_channels=6 * 4, kernel_size=1, padding=1),
Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
Conv2d(in_channels=128, out_channels=6 * 4, kernel_size=1, padding=1),
])
classification_headers = ModuleList([
Conv2d(in_channels=512, out_channels=3 * num_classes, kernel_size=1, padding=1),
Conv2d(in_channels=1024, out_channels=6 * num_classes, kernel_size=1, padding=1),
Conv2d(in_channels=512, out_channels=6 * num_classes, kernel_size=1, padding=1),
Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
Conv2d(in_channels=128, out_channels=6 * num_classes, kernel_size=1, padding=1),
])
This caused the execution of run_converted_pytorch_ssd_live_demo.py to crash with this error:
RuntimeError: The size of tensor a (2781) must match the size of tensor b (3000) at non-singleton dimension 1
Is it possible that the SSD mobilenet architecture has been modified in time and some new adjustments have to be made in order to keep the code correct? Or it's just me that I'm missing something?
Thanks
hi @kamauz , the priors/anchors needed in your model and the way of branching out paths to detection head might be different.
@kamauz Hi, do you find the solution? I also changed the network structure and faced the same problem as yours.
@qfgaohao Hi, in this situation, should I change the parameters of SSDSpec in the config file? Thank you!
@qfgaohao I ve abandoned this repository time ago for this problem. I remember that I tried to change SSDSpec but I couldn't make it work. I don't exclude that the solution was there by the way. It is tricky maybe
@kamauz @notabigfish you can also change the number of channels of extra layers to make the network output is consistent with the generated anchors.
Thank you for the answers!! Actually, similar to @kamauz , I delete all the BatchNorm layers after Pointwise Conv layer and got location
with size [..., 1434] but priors
with size [..., 3000]. The reason is that the first feature map size in vision/ssd/ssd.py
line 58 is [1, 576, 10, 10], but it should be [1, 576, 19, 19]. So I changed line 14 in vision/ssd/config/mobilenetv1_ssd_config.py
to
SSDSpec(10, 16, SSDBoxSizes(60, 105), [2, 3])
Then the error that @kamauz mentioned is fixed. However, a new error shows up:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimention 1. Got 41 and 47 in dimension 0 at /pytorch/aten/src/TH/generic/THTensor.cpp:711
It turns out after changing the configuration, some labels
got size torch.Size([0])
.
I did not change any convolution layer but only deleted the batchnorm layer. So in theory the output channels are unchanged, right? Or maybe I miss something? Thank you!! @qfgaohao
@notabigfish Are you trying in the same situation that I tried? So with the PTH file got from a conversion of the official tensorflow model? By the way months ago I assumed that maybe Google developers experimented a different version with respect to the official SSD paper. I don't know if removing the BatchNorm could be a good idea. Maybe it's a matter of implemented choices that we can't know unless we can see how they actually trained the network. Keep me updated if you find a solution