pytorch-optimizer icon indicating copy to clipboard operation
pytorch-optimizer copied to clipboard

Implement gradient pre-normalization in LAMB optimizer

Open jglaser opened this issue 3 years ago • 0 comments

This PR implements the normalization of gradients (by the norm of all gradients in the model) as discussed in https://developer.nvidia.com/blog/pretraining-bert-with-layer-wise-adaptive-learning-rates/

Adding the prenorm Boolean option to torch_optimizer.lamb

jglaser avatar Jan 15 '22 17:01 jglaser