torchinfo icon indicating copy to clipboard operation
torchinfo copied to clipboard

Investigating overestimation of total parameter counts for multiple models

Open 6DammK9 opened this issue 1 year ago • 3 comments

Describe the bug Total params of the model may be overestmated up to 2.37x for multiple models, meanwhile other models remains accurate.

I wonder if there is something common across these models, yielding some double count and hence the overestimated result.

To Reproduce

  • Run this code snippet directly. Ref
from ultralytics import YOLO
from torchinfo import summary

# Load the yolov10n model
model = YOLO("yolov10n.pt")

# It is a pytorch model, so we count it directrly.
pytorch_total_params = sum(p.numel() for p in model.parameters())

# Pass model will trigger model.train which is not good.
model_summary = summary(model.model, 
    #input_data="path/to/bus.jpg",
    input_size=(1,3,640,640), 
    col_names=("input_size", "output_size", "num_params")
)

with open('summary.txt', 'w', encoding='utf-8') as the_file:
    the_file.write(f"{str(model_summary)}\r\nTotal Params (torch): {str(pytorch_total_params)}\r\nTotal Params (info): {model.info()}")
  • My ipynb would be a lot more sophisticated, including intercepting the input data within the pipeline. The counts should remains the same. The inaccurate result will be kind of 2.0b vs 860M (SD1), 2.1b vs 865M (SD2), and 5.3b vs 2.6b (SDXL).

Expected behavior Viewing the generated summary.txt will see the inconsistent result, which is overestimated for 1.78x. Official claims 2.3M but model.info() used the same nn.numel() approach which gives 2775520 also.

...
│    │    └─DFL: 3-137                                       [1, 64, 8400]             [1, 4, 8400]              (16)
=======================================================================================================================================
Total params: 4,932,416
Trainable params: 0
Non-trainable params: 4,932,416
Total mult-adds (Units.GIGABYTES): 4.29
=======================================================================================================================================
Input size (MB): 4.92
Forward/backward pass size (MB): 362.66
Params size (MB): 11.10
Estimated Total Size (MB): 378.68
=======================================================================================================================================

Total Params (torch): 2775520

Total Params (info): (385, 2775520, 0, 8.7404288)

Runtime environment

  • Python 3.10 under conda
torch==2.4.0+cu124
diffusers==0.30.0
transformers==4.44.0

6DammK9 avatar Sep 25 '24 08:09 6DammK9

Hmm, perhaps recursive layers? Not sure what this could be

TylerYep avatar Sep 26 '24 00:09 TylerYep

Hmm, perhaps recursive layers? Not sure what this could be

It has something to do with the shared embeddings, in for many transformers models, you can set tie_word_embeddings to make the embeddings in encoder, decoder and lm_head sharing the same weights. To better estimate the number of params, I wish the torchinfo track the id of model params to remove duplication and report counts with and without duplication.

alephpi avatar Sep 23 '25 09:09 alephpi

I opened a new issue #377 summarizing this, please take a look @TylerYep

alephpi avatar Sep 23 '25 09:09 alephpi