ColossalAI
ColossalAI copied to clipboard
[FEATURE]: adding grad_norm logging in training process
Describe the feature
I have been using the colossalai framework for my project and I noticed that there is no way to obtain the grad_norm
after the backward pass during training. For example, the optimizer ZeroOptimizer
calculates the grad_norm using the _calc_global_norm
method in the step()
method and then clips the grad_norm. If I want to separately obtain the grad_norm, I have to call _calc_global_norm
again, which results in unnecessary additional calls. However, in PyTorch, the logic of clip_grad_norm()
and optimizer.step()
are decoupled, and this issue does not exist.
I would like to request the addition of a logging mechanism for the grad_norm
during the training process. This could be achieved using the logging module or TensorBoard. This would make it easier to monitor the training process and ensure that the gradients are within the desired range.
Hi @pluiez
I'll add this feature, when I am available.
Is it available now? thx