psla icon indicating copy to clipboard operation
psla copied to clipboard

Number of parameters of the model

Open danihinjos opened this issue 2 years ago • 2 comments

Hello!

I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.

Screenshot 2022-07-04 at 13 27 21

Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)

danihinjos avatar Jul 04 '22 11:07 danihinjos

Hi there,

You are correct that the EfficientNet-B2 model without attention is 7.7M, the number of params of the multi-head attention module depends on the number of the classes of the task, so it changes with the task. In the paper, we report the model size for AudioSet (527 class). Below is the detailed calculation:

The 9.2M model is the original EfficientNet B2 model for 1,000 class image classification, which does not contain an attention module. In the efficientnet_pytorch implementation, the exact number of parameters is 9.109M, after removing the last fully connected layer for image classification that has 1.409M parameters (input size of 1,408 and output size of 1,000), the EfficientNet-B2 feature extractor has 7.700M parameters. For the attention module, each head has an attention branch and a classification branch, each having 1,408\times527=0.742M parameters. Hence, the four-headed attention module has 0.742M\times2\times4=5.936M parameters. The total model size is 7.700M+5.936M=13.64M parameters.

Does this help?

-Yuan

YuanGongND avatar Jul 04 '22 16:07 YuanGongND

Oh my, I see!!

I am sorry, I was totally missing to add the number of classes to the computation. Now everything makes sense. Thank you again for explaining everything so clear, for answering so quickly and for your help and consideration.

Regards from Switzerland!

danihinjos avatar Jul 04 '22 20:07 danihinjos