ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[FEATURE]: adding grad_norm logging in training process

Open pluiez opened this issue 1 year ago • 2 comments

Describe the feature

I have been using the colossalai framework for my project and I noticed that there is no way to obtain the grad_norm after the backward pass during training. For example, the optimizer ZeroOptimizer calculates the grad_norm using the _calc_global_norm method in the step() method and then clips the grad_norm. If I want to separately obtain the grad_norm, I have to call _calc_global_norm again, which results in unnecessary additional calls. However, in PyTorch, the logic of clip_grad_norm() and optimizer.step() are decoupled, and this issue does not exist.

I would like to request the addition of a logging mechanism for the grad_norm during the training process. This could be achieved using the logging module or TensorBoard. This would make it easier to monitor the training process and ensure that the gradients are within the desired range.

pluiez avatar Feb 13 '23 09:02 pluiez

Hi @pluiez

I'll add this feature, when I am available.

1SAA avatar Apr 20 '23 07:04 1SAA

Is it available now? thx

ericxsun avatar Nov 20 '23 14:11 ericxsun