Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Avoid re-computing model parameter count every iteration
In training.py
, we have
https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/fd1e1da967c74e598acfc011031474663ef5845e/megatron/training.py#L818
However, this appears to be wasted compute since the model parameter count does not change. We can refactor the code so that get_parameters_in_billions
is called only once.