thinc icon indicating copy to clipboard operation
thinc copied to clipboard

SWA and Lookahead optimzer

Open kadarakos opened this issue 3 years ago • 3 comments

Optimizer wrappers

Currently the Optimizer is quite large class. It implements a version of weight averaging that starting from the first step it keeps an exponential moving average of the weights. This PR suggests to move this functionality out of the Optimizer into the SWA class.

The SWA is the Stochastic Weight Averaging as implemented in https://arxiv.org/abs/1803.05407. It has attributes:

  1. start_step: the training-step where to start recording the moving averages from.
  2. freq: the number of steps between updating the moving average.
  3. lr: the new learning-rate for the SWA steps.

The PR contains also a Lookahead optimizer wrapper that implements a similar algorithm to SWA. It's from https://arxiv.org/abs/1907.08610. It keeps track of "slow" weights and "fast" weights. Every freq train-steps it updates the "slow" weights with an exponential moving average of the "slow" weights and replaces the "fast" with the "slow". Essentially it let's the optimizer run the optimization forward k iterates, but then it pulls it back with a factor pullback.

The example optimizer_wrappers.py shows how to run the Lookahead with Adam from a fixed number of epochs and then to swap it to SWA. This uses the SWA.start_swa() functionality that let's the caller immediately run SWA without setting start_step.

kadarakos avatar Apr 01 '22 15:04 kadarakos

Oh, one more thing about wrapping: there are various methods in Thinc and spaCy that take Optimizer arguments. Using these new classes would not typecheck, since they do not derive from Optimizer. I am not sure whether deriving from Optimizer does make sense, since that would also inherit a lot of internal state? I guess ideally Optimizer would be a base class.

danieldk avatar Apr 07 '22 06:04 danieldk

@danieldk I don't think I'll have time to look at this in proper detail, so I'm relying on you to approve it 🙏 😄

I think the idea is good though, so I'm happy to see it merged once all the details are approved!

honnibal avatar Jun 13 '22 10:06 honnibal

@danieldk I don't think I'll have time to look at this in proper detail, so I'm relying on you to approve it 🙏 😄

I think the idea is good though, so I'm happy to see it merged once all the details are approved!

Ok, I'll do another review round (probably tomorrow).

danieldk avatar Jun 13 '22 11:06 danieldk

Closed this PR, because over time the priorities shifted.

kadarakos avatar Jun 16 '23 07:06 kadarakos