cooper
cooper copied to clipboard
Extrapolation from the past
Enhancement
Implement the extrapolation from the past algorithm (Popov, 1980). A good and modern source is Gidel et al. (2019).
This is an algorithm for computing parameter updates similar to extragradient: it computes the direction for updating parameters based on a "lookahead step". It is less intensive computationally than extragradient and enjoys similar convergence results for some class problems (Gidel et al., 2019).
Motivation
Whereas extragradient requires two gradient computations per parameter update, extrapolation from the past stores and re-uses gradients from previous extrapolation steps for use during current extrapolation steps. This means less computational intensity in terms of gradient calculations, which may be helpful in some settings.
However, storing the previous gradients still means an overhead in terms of storage as opposed to gradient descent-ascent.
References
- G. Gidel, H. Berard, G. Vignoud, P. Vincent, S. Lacoste-Julien. A Variational Inequality Perspective on Generative Adversarial Networks. In ICLR, 2019.
- L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 1980.