OpenNMT-py [WIP] LAMB optimizer

[WIP] LAMB optimizer

Open francoishernandez opened this issue 6 years ago • 1 comments

trafficstars

[DO NOT MERGE]

This is a WIP on implementing LAMB optimizer from BERT. It apparently allows to scale training on huge batches. There are some ambiguities : different algorithms between v1 and v2/v3 of the paper, some blurry definitions and no official implementation yet (a few ones are out there but differ on a few points), no clear learning_rate schedule in the paper despite detailed experiments, etc. Also, there might be some significant tuning to do in order to find appropriate values for our tasks. I open this PR for future work, when we'll have more elements.

The current version here is based on https://github.com/cybertronai/pytorch-lamb, which itself is based on torch.optimizers.Adam.

Jun 05 '19 14:06 francoishernandez

LGTM

Jul 16 '19 00:07 alphadl

OpenNMT-py OpenNMT-py copied to clipboard

[WIP] LAMB optimizer

OpenNMT-py
OpenNMT-py copied to clipboard