pytorch-optimizer
pytorch-optimizer copied to clipboard
LAMB: Differences from the paper author's official implementation
The LAMB implementation of the PyTorch version you released is different from the official version of TensorFlow released by the paper author. According to the official implementation published in the paper, the author's code implementation skips some parameters according to their names() when calculating. But in your implementation, it seems that all parameters are directly involved in the calculation. For example, exclude_from_weight_decay=["batch_normalization", "LayerNorm", "layer_norm"] Their implementation: https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py
I suspect there is something wrong with this implementation. When I used LAMB in MXNet, it was always good enough, but here...
I also tried Lamb for https://github.com/coqui-ai/TTS/ on PyTorch 1.9 but didn't even lower the training loss curve.
Not sure when I will be able to take a look, but happy to accept PRs with fixes.