OpenNMT-py
OpenNMT-py copied to clipboard
[WIP] LAMB optimizer
[DO NOT MERGE]
This is a WIP on implementing LAMB optimizer from BERT. It apparently allows to scale training on huge batches. There are some ambiguities : different algorithms between v1 and v2/v3 of the paper, some blurry definitions and no official implementation yet (a few ones are out there but differ on a few points), no clear learning_rate schedule in the paper despite detailed experiments, etc. Also, there might be some significant tuning to do in order to find appropriate values for our tasks. I open this PR for future work, when we'll have more elements.
The current version here is based on https://github.com/cybertronai/pytorch-lamb, which itself is based on torch.optimizers.Adam.
LGTM