LookaheadOptimizer-mx
LookaheadOptimizer-mx copied to clipboard
Does this implementation maintain the momentum?
For optimizers like sgd+momentum, adam, rmsprop, they may use the historical information of the gradients. Does this implementation maintain / reset / interpolate the momentum in each outer loop?
Thank you for pointing it out!
This implementation doesn't reset the momentum in outer loop. I will try to fix it.