pytorch-cifar icon indicating copy to clipboard operation
pytorch-cifar copied to clipboard

AvgPool2d (kernel_size=1)

Open minhlab opened this issue 4 years ago • 5 comments

In vgg.py I found this line: layers += [nn.AvgPool2d(kernel_size=1, stride=1)] .

Do I understand correctly that AvgPool2d layers with kernel_size=1 just return the input as it is? Why do we need them?

minhlab avatar Jun 20 '20 15:06 minhlab

I have the same doubt. I tried the foll. code to confirm that it is giving the same shape output -



import torch
import torch.nn as nn
from pprint import pprint 

# code from -  https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html

# pool of square window of size=3, stride=2
m1 = nn.AvgPool2d(3, stride=2)
m2 = nn.AvgPool2d(1, stride=1)
# pool of non-square window
m3 = nn.AvgPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m2(input)
pprint(output.shape)

kewlcoder avatar Dec 27 '20 12:12 kewlcoder

I think the avg pooling can be understood by following code -

# First install -> pytorch_model_summary via anaconda
from pytorch_model_summary import summary
net = VGG('VGG11')
print(summary(net, torch.randn(2,3,32,32), show_input=True))
print(summary(net, torch.randn(2,3,32,32), show_input=True, show_hierarchical=True))

This gives the output -

------------------------------------------------------------------------
 out = out.view(out.size(0), -1) = torch.Size([2, 512])
------------------------------------------------------------------------
      Layer (type)          Input Shape         Param #     Tr. Param #
========================================================================
          Conv2d-1       [2, 3, 32, 32]           1,792           1,792
     BatchNorm2d-2      [2, 64, 32, 32]             128             128
            ReLU-3      [2, 64, 32, 32]               0               0
       MaxPool2d-4      [2, 64, 32, 32]               0               0
          Conv2d-5      [2, 64, 16, 16]          73,856          73,856
     BatchNorm2d-6     [2, 128, 16, 16]             256             256
            ReLU-7     [2, 128, 16, 16]               0               0
       MaxPool2d-8     [2, 128, 16, 16]               0               0
          Conv2d-9       [2, 128, 8, 8]         295,168         295,168
    BatchNorm2d-10       [2, 256, 8, 8]             512             512
           ReLU-11       [2, 256, 8, 8]               0               0
         Conv2d-12       [2, 256, 8, 8]         590,080         590,080
    BatchNorm2d-13       [2, 256, 8, 8]             512             512
           ReLU-14       [2, 256, 8, 8]               0               0
      MaxPool2d-15       [2, 256, 8, 8]               0               0
         Conv2d-16       [2, 256, 4, 4]       1,180,160       1,180,160
    BatchNorm2d-17       [2, 512, 4, 4]           1,024           1,024
           ReLU-18       [2, 512, 4, 4]               0               0
         Conv2d-19       [2, 512, 4, 4]       2,359,808       2,359,808
    BatchNorm2d-20       [2, 512, 4, 4]           1,024           1,024
           ReLU-21       [2, 512, 4, 4]               0               0
      MaxPool2d-22       [2, 512, 4, 4]               0               0
         Conv2d-23       [2, 512, 2, 2]       2,359,808       2,359,808
    BatchNorm2d-24       [2, 512, 2, 2]           1,024           1,024
           ReLU-25       [2, 512, 2, 2]               0               0
         Conv2d-26       [2, 512, 2, 2]       2,359,808       2,359,808
    BatchNorm2d-27       [2, 512, 2, 2]           1,024           1,024
           ReLU-28       [2, 512, 2, 2]               0               0
      MaxPool2d-29       [2, 512, 2, 2]               0               0
      AvgPool2d-30       [2, 512, 1, 1]               0               0
         Linear-31             [2, 512]           5,130           5,130
========================================================================
Total params: 9,231,114
Trainable params: 9,231,114
Non-trainable params: 0
------------------------------------------------------------------------


========================================== Hierarchical Summary ==========================================

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 1,792 params
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 128 params
    (2): ReLU(inplace=True), 0 params
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 73,856 params
    (5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 256 params
    (6): ReLU(inplace=True), 0 params
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
    (8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 295,168 params
    (9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 512 params
    (10): ReLU(inplace=True), 0 params
    (11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 590,080 params
    (12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 512 params
    (13): ReLU(inplace=True), 0 params
    (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
    (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 1,180,160 params
    (16): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
    (17): ReLU(inplace=True), 0 params
    (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
    (19): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
    (20): ReLU(inplace=True), 0 params
    (21): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
    (22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
    (23): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
    (24): ReLU(inplace=True), 0 params
    (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
    (26): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
    (27): ReLU(inplace=True), 0 params
    (28): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
    (29): AvgPool2d(kernel_size=1, stride=1, padding=0), 0 params
  ), 9,225,984 params
  (classifier): Linear(in_features=512, out_features=10, bias=True), 5,130 params
), 9,231,114 params


==========================================================================================================

Here, you can see avg pooling helps reshape/convert (2, 512, 1, 1) array into (2, 512) array.

kewlcoder avatar Dec 28 '20 09:12 kewlcoder

No, that's the flatten call in the forward() function.

minhlab avatar Dec 28 '20 13:12 minhlab

Oh, yes ! You are right. Sorry, missed the code snippet -

        out = out.view(out.size(0), -1)

This brings us back to the question.

kewlcoder avatar Dec 28 '20 15:12 kewlcoder

As per the original pytorch implementation, we don't have any average pooling layers present. I think this is a mistake

Hrushikesh-github avatar Feb 15 '21 11:02 Hrushikesh-github