tensorflow-yolo-v3
tensorflow-yolo-v3 copied to clipboard
Mistake in your version of tiny yolo v3
Hi,
I am studying you implementation of yolo_v3-tiny. Thanks for your job !
But it seems that you have introduced an error compared to the original network.
According to the Darknet's config file : here The last MaxPooling layer has a size of [2, 2] with stride of 1.
But your network has only maxpool layer has a size of [2, 2] with stride of 2 !!
So, the tensor resized by the upsampling layer is not of shape [13, 13, 128] like in Darknet, but is [6, 6, 128]. And I am pretty sure that it is really impacting performance of recognition.
Could you please fix this issue ? (I'm not really comfortable with Slim, so I can't do the modification by myself).
Do you think this modification is sufficient:
changing line 62 of yolo_v3_tiny.py :
from :
inputs = slim.max_pool2d(inputs, [2, 2], scope='pool2')
to
if i < 5:
inputs = slim.max_pool2d(inputs, [2, 2], scope='pool2')
else:
inputs = slim.max_pool2d(inputs, [2, 2], stride=1, padding="SAME", scope='pool2')
@LucasMahieu sorry for late response. Yes, this modification looks good. Can you submit pull request with this change?
Yes, I will, thanks
Before we close the issue, There are more mistakes. The following log is from darknet. This use yolov3-tiny.cfg.
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 255 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 255 0.044 BFLOPs
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 255 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 255 0.088 BFLOPs
23 yolo
Max pool no.11 has stride 1 which currently awaiting fix from pool request.
EDIT : I'm sorry, I did the mistake reporting that both conv no.15 and conv no.22 are missing. Both are actually included in _detection_layer() function because both have linear activation function. @LucasMahieu You can unlike me, man. XD
The fix of the stride for the 5th max pooling (11th layer) is pushed.
Pull request is pending .
@i3oi3o Thanks for your edit.
HI there. Based on the formula, shouldn't a 2x2/1 Maxpool result in the dimensions reducing by 1 on both axes? I have been trying my own implementation of v3 tiny and I have been getting the following output tensor shapes.
torch.Size([1, 16, 416, 416]) torch.Size([1, 16, 208, 208]) torch.Size([1, 32, 208, 208]) torch.Size([1, 32, 104, 104]) torch.Size([1, 64, 104, 104]) torch.Size([1, 64, 52, 52]) torch.Size([1, 128, 52, 52]) torch.Size([1, 128, 26, 26]) torch.Size([1, 256, 26, 26]) torch.Size([1, 256, 13, 13]) torch.Size([1, 512, 13, 13]) torch.Size([1, 512, 12, 12]) torch.Size([1, 1024, 12, 12]) torch.Size([1, 256, 12, 12]) torch.Size([1, 512, 12, 12]) torch.Size([1, 255, 12, 12]) torch.Size([1, 432, 85]) torch.Size([1, 256, 12, 12]) torch.Size([1, 128, 12, 12]) torch.Size([1, 128, 24, 24])