pytorch-cifar
pytorch-cifar copied to clipboard
AvgPool2d (kernel_size=1)
In vgg.py
I found this line: layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
.
Do I understand correctly that AvgPool2d layers with kernel_size=1 just return the input as it is? Why do we need them?
I have the same doubt. I tried the foll. code to confirm that it is giving the same shape output -
import torch
import torch.nn as nn
from pprint import pprint
# code from - https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html
# pool of square window of size=3, stride=2
m1 = nn.AvgPool2d(3, stride=2)
m2 = nn.AvgPool2d(1, stride=1)
# pool of non-square window
m3 = nn.AvgPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m2(input)
pprint(output.shape)
I think the avg pooling can be understood by following code -
# First install -> pytorch_model_summary via anaconda
from pytorch_model_summary import summary
net = VGG('VGG11')
print(summary(net, torch.randn(2,3,32,32), show_input=True))
print(summary(net, torch.randn(2,3,32,32), show_input=True, show_hierarchical=True))
This gives the output -
------------------------------------------------------------------------
out = out.view(out.size(0), -1) = torch.Size([2, 512])
------------------------------------------------------------------------
Layer (type) Input Shape Param # Tr. Param #
========================================================================
Conv2d-1 [2, 3, 32, 32] 1,792 1,792
BatchNorm2d-2 [2, 64, 32, 32] 128 128
ReLU-3 [2, 64, 32, 32] 0 0
MaxPool2d-4 [2, 64, 32, 32] 0 0
Conv2d-5 [2, 64, 16, 16] 73,856 73,856
BatchNorm2d-6 [2, 128, 16, 16] 256 256
ReLU-7 [2, 128, 16, 16] 0 0
MaxPool2d-8 [2, 128, 16, 16] 0 0
Conv2d-9 [2, 128, 8, 8] 295,168 295,168
BatchNorm2d-10 [2, 256, 8, 8] 512 512
ReLU-11 [2, 256, 8, 8] 0 0
Conv2d-12 [2, 256, 8, 8] 590,080 590,080
BatchNorm2d-13 [2, 256, 8, 8] 512 512
ReLU-14 [2, 256, 8, 8] 0 0
MaxPool2d-15 [2, 256, 8, 8] 0 0
Conv2d-16 [2, 256, 4, 4] 1,180,160 1,180,160
BatchNorm2d-17 [2, 512, 4, 4] 1,024 1,024
ReLU-18 [2, 512, 4, 4] 0 0
Conv2d-19 [2, 512, 4, 4] 2,359,808 2,359,808
BatchNorm2d-20 [2, 512, 4, 4] 1,024 1,024
ReLU-21 [2, 512, 4, 4] 0 0
MaxPool2d-22 [2, 512, 4, 4] 0 0
Conv2d-23 [2, 512, 2, 2] 2,359,808 2,359,808
BatchNorm2d-24 [2, 512, 2, 2] 1,024 1,024
ReLU-25 [2, 512, 2, 2] 0 0
Conv2d-26 [2, 512, 2, 2] 2,359,808 2,359,808
BatchNorm2d-27 [2, 512, 2, 2] 1,024 1,024
ReLU-28 [2, 512, 2, 2] 0 0
MaxPool2d-29 [2, 512, 2, 2] 0 0
AvgPool2d-30 [2, 512, 1, 1] 0 0
Linear-31 [2, 512] 5,130 5,130
========================================================================
Total params: 9,231,114
Trainable params: 9,231,114
Non-trainable params: 0
------------------------------------------------------------------------
========================================== Hierarchical Summary ==========================================
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 1,792 params
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 128 params
(2): ReLU(inplace=True), 0 params
(3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
(4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 73,856 params
(5): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 256 params
(6): ReLU(inplace=True), 0 params
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
(8): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 295,168 params
(9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 512 params
(10): ReLU(inplace=True), 0 params
(11): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 590,080 params
(12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 512 params
(13): ReLU(inplace=True), 0 params
(14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
(15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 1,180,160 params
(16): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
(17): ReLU(inplace=True), 0 params
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
(19): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
(20): ReLU(inplace=True), 0 params
(21): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
(22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
(23): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
(24): ReLU(inplace=True), 0 params
(25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)), 2,359,808 params
(26): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), 1,024 params
(27): ReLU(inplace=True), 0 params
(28): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), 0 params
(29): AvgPool2d(kernel_size=1, stride=1, padding=0), 0 params
), 9,225,984 params
(classifier): Linear(in_features=512, out_features=10, bias=True), 5,130 params
), 9,231,114 params
==========================================================================================================
Here, you can see avg pooling helps reshape/convert (2, 512, 1, 1) array into (2, 512) array.
No, that's the flatten call in the forward() function.
Oh, yes ! You are right. Sorry, missed the code snippet -
out = out.view(out.size(0), -1)
This brings us back to the question.
As per the original pytorch implementation, we don't have any average pooling layers present. I think this is a mistake