tensorflow-yolo-v3 icon indicating copy to clipboard operation
tensorflow-yolo-v3 copied to clipboard

Mistake in your version of tiny yolo v3

Open LucasMahieu opened this issue 5 years ago • 6 comments

Hi,

I am studying you implementation of yolo_v3-tiny. Thanks for your job !

But it seems that you have introduced an error compared to the original network.

According to the Darknet's config file : here The last MaxPooling layer has a size of [2, 2] with stride of 1.

But your network has only maxpool layer has a size of [2, 2] with stride of 2 !!

So, the tensor resized by the upsampling layer is not of shape [13, 13, 128] like in Darknet, but is [6, 6, 128]. And I am pretty sure that it is really impacting performance of recognition.

Could you please fix this issue ? (I'm not really comfortable with Slim, so I can't do the modification by myself).

LucasMahieu avatar Jul 24 '18 15:07 LucasMahieu

Do you think this modification is sufficient: changing line 62 of yolo_v3_tiny.py : from : inputs = slim.max_pool2d(inputs, [2, 2], scope='pool2') to

if i < 5:                                              
    inputs = slim.max_pool2d(inputs, [2, 2], scope='pool2')                 
else:
     inputs = slim.max_pool2d(inputs, [2, 2], stride=1, padding="SAME", scope='pool2')

LucasMahieu avatar Jul 24 '18 15:07 LucasMahieu

@LucasMahieu sorry for late response. Yes, this modification looks good. Can you submit pull request with this change?

mystic123 avatar Aug 22 '18 09:08 mystic123

Yes, I will, thanks

LucasMahieu avatar Aug 22 '18 09:08 LucasMahieu

Before we close the issue, There are more mistakes. The following log is from darknet. This use yolov3-tiny.cfg.

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255  0.044 BFLOPs
   16 yolo
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv    255  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 255  0.088 BFLOPs
   23 yolo

Max pool no.11 has stride 1 which currently awaiting fix from pool request.


EDIT : I'm sorry, I did the mistake reporting that both conv no.15 and conv no.22 are missing. Both are actually included in _detection_layer() function because both have linear activation function. @LucasMahieu You can unlike me, man. XD

i3oi3o avatar Oct 08 '18 07:10 i3oi3o

The fix of the stride for the 5th max pooling (11th layer) is pushed.

Pull request is pending .

@i3oi3o Thanks for your edit.

LucasMahieu avatar Oct 22 '18 12:10 LucasMahieu

HI there. Based on the formula, shouldn't a 2x2/1 Maxpool result in the dimensions reducing by 1 on both axes? I have been trying my own implementation of v3 tiny and I have been getting the following output tensor shapes.

torch.Size([1, 16, 416, 416]) torch.Size([1, 16, 208, 208]) torch.Size([1, 32, 208, 208]) torch.Size([1, 32, 104, 104]) torch.Size([1, 64, 104, 104]) torch.Size([1, 64, 52, 52]) torch.Size([1, 128, 52, 52]) torch.Size([1, 128, 26, 26]) torch.Size([1, 256, 26, 26]) torch.Size([1, 256, 13, 13]) torch.Size([1, 512, 13, 13]) torch.Size([1, 512, 12, 12]) torch.Size([1, 1024, 12, 12]) torch.Size([1, 256, 12, 12]) torch.Size([1, 512, 12, 12]) torch.Size([1, 255, 12, 12]) torch.Size([1, 432, 85]) torch.Size([1, 256, 12, 12]) torch.Size([1, 128, 12, 12]) torch.Size([1, 128, 24, 24])

Abhi-Mollera avatar Aug 09 '23 09:08 Abhi-Mollera