pytorch-optimizer LAMB: Differences from the paper author's official implementation

LAMB: Differences from the paper author's official implementation

Open binmakeswell opened this issue 5 years ago • 3 comments

The LAMB implementation of the PyTorch version you released is different from the official version of TensorFlow released by the paper author. According to the official implementation published in the paper, the author's code implementation skips some parameters according to their names() when calculating. But in your implementation, it seems that all parameters are directly involved in the calculation. For example, exclude_from_weight_decay=["batch_normalization", "LayerNorm", "layer_norm"] Their implementation: https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py

Nov 20 '20 12:11 binmakeswell

I suspect there is something wrong with this implementation. When I used LAMB in MXNet, it was always good enough, but here...

Apr 24 '21 19:04 EmilPi

I also tried Lamb for https://github.com/coqui-ai/TTS/ on PyTorch 1.9 but didn't even lower the training loss curve.

Jul 21 '21 08:07 erogol

Not sure when I will be able to take a look, but happy to accept PRs with fixes.

Oct 02 '21 15:10 jettify

pytorch-optimizer pytorch-optimizer copied to clipboard

LAMB: Differences from the paper author's official implementation

pytorch-optimizer
pytorch-optimizer copied to clipboard