Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

Optimiser: Implement AdamW and SGDW

Open dylanagreen opened this issue 4 years ago • 3 comments

Two variants of already implemented Arraymancer optimizers have been gaining traction recently, specifically the AdamW and SGDW optimizers. Both of these were proposed in the 2017 paper Decoupled Weight Decay Regularization but have only recently seen widespread use. Mathematical formulas for both new update procedures are given on page 3.

Tensorflow has had them implemented for a while as "weight decay optimizers," for example AdamW:

https://github.com/tensorflow/tensorflow/blob/5912f51d580551e5cee2cfde4cb882594b4d3e60/tensorflow/contrib/opt/python/training/weight_decay_optimizers.py#L356-L362

PyTorch closed a pull request implementing AdamW a month ago: https://github.com/pytorch/pytorch/blob/master/torch/optim/adamw.py with a currently open pull for SGDW.

I feel like since these two optimizers are simply modified and extended versions of optimizers already present in Arraymancer, implementing them would be time-effective.

AdamW and SGDW both operate on the principle of decoupling the weight decay factor from the gradient update and rather applying it to the weight update. Each optimizer step the previously found weights are "decayed" by some weight decay factor. This prevents the weights from growing too large and helps prevent overfitting. Additionally it moves weight decay and learning rate into a more separable space (see page 6).

dylanagreen avatar Jul 23 '19 04:07 dylanagreen

There is also AMSGrad that I looked into see https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py#L20 and On the Convergence of Adam and Beyond paper.

And this led to AdamX: https://arxiv.org/abs/1904.03590 :P

mratsim avatar Jul 23 '19 08:07 mratsim

Well if we want to cover all our bases we have to collect all Adam variants. In the appendix to the aforementioned AdamW paper they propose a further variant AdamR (Adam with warm restarts), which can also be combined with AdamW to form AdamWR. It's a veritable alphabet soup of Adam variants!

dylanagreen avatar Jul 23 '19 16:07 dylanagreen

There is now RectifiedAdam (RAdam) (https://arxiv.org/abs/1908.03265) and Radam + LookAhead by Hinton (https://arxiv.org/abs/1907.08610v1)

mratsim avatar Aug 23 '19 19:08 mratsim