Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Need model size dumped at init
We need to have a diagnostic model size dumped during the framework init. We currently get a report per rank and not the total.
> number of parameters on (tensor, pipeline) model parallel rank (0, 1): 1745293312
> number of parameters on (tensor, pipeline) model parallel rank (2, 1): 1745293312
> number of parameters on (tensor, pipeline) model parallel rank (3, 0): 1986465792
> number of parameters on (tensor, pipeline) model parallel rank (3, 7): 1986498560
Later on ZeRO engine does dump the right thing amongst multiple other numbers and repeated on each rank
[2021-10-02 16:08:53,028] [INFO] [engine.py:134:__init__] RANK=0 STAGE=0 LAYERS=7 [0, 7) STAGE_PARAMS=1986465792 (1986.466M) TOTAL_PARAMS=57778896896 (57778.897M) UNIQUE_PARAMS=56814206976 (56814.207M)
But ideally we just want a print like:
Model size: 57B (57778896896 params)
Just on rank 0.
Thanks.
I think I can try and take this issue. However, I have to know what do you do get the diagnostics dump?
Also, does the dump happen when starting the workflows?
Thank you for offering to work on this, @jtboing
We, the BS group, haven't added anything yet to this functionality, so it's totally up to you how you do it - please have a look at the various info logged during Meg-DS startup and add it where you feel is right. Probably the best place to do it is where the model is created since you can then easily query the params.
I don't think it really matters where, other than that we could easily grep for something like:
grep "Model size" log.txt
here is my cheatsheet if it helps:
# calculate the number of parameters:
#
# 1. count all params
sum(p.numel() for p in model.parameters())
#
# 2. avoid double counting shared parameters (only if there is a shared storage(), normal tied vars don't have this issue, as model.parameters() doesn't return shared vars)
sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())
#
# 3. count only the trainable parameters:
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
Hello. Sorry that this hasn't been done sooner and I am trying to get through this now. I am looking for where the Meg-DS startup script/process. Can you point to me which script/process initiates the framework init?
We have already started sorting it out here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/204 (as a side effect of another need).