thinc
thinc copied to clipboard
SWA and Lookahead optimzer
Optimizer wrappers
Currently the Optimizer is quite large class. It implements a version of weight averaging that starting from the first step it keeps an exponential moving average of the weights. This PR suggests to move this functionality out of the Optimizer into the SWA class.
The SWA is the Stochastic Weight Averaging as implemented in https://arxiv.org/abs/1803.05407. It has attributes:
start_step: the training-step where to start recording the moving averages from.freq: the number of steps between updating the moving average.lr: the new learning-rate for theSWAsteps.
The PR contains also a Lookahead optimizer wrapper that implements a similar algorithm to SWA. It's from https://arxiv.org/abs/1907.08610. It keeps track of "slow" weights and "fast" weights. Every freq train-steps it updates the "slow" weights with an exponential moving average of the "slow" weights and replaces the "fast" with the "slow". Essentially it let's the optimizer run the optimization forward k iterates, but then it pulls it back with a factor pullback.
The example optimizer_wrappers.py shows how to run the Lookahead with Adam from a fixed number of epochs and then to swap it to SWA. This uses the SWA.start_swa() functionality that let's the caller immediately run SWA without setting start_step.
Oh, one more thing about wrapping: there are various methods in Thinc and spaCy that take Optimizer arguments. Using these new classes would not typecheck, since they do not derive from Optimizer. I am not sure whether deriving from Optimizer does make sense, since that would also inherit a lot of internal state? I guess ideally Optimizer would be a base class.
@danieldk I don't think I'll have time to look at this in proper detail, so I'm relying on you to approve it 🙏 😄
I think the idea is good though, so I'm happy to see it merged once all the details are approved!
@danieldk I don't think I'll have time to look at this in proper detail, so I'm relying on you to approve it 🙏 😄
I think the idea is good though, so I'm happy to see it merged once all the details are approved!
Ok, I'll do another review round (probably tomorrow).
Closed this PR, because over time the priorities shifted.