gradient_descent_viz
gradient_descent_viz copied to clipboard
Add support for QHM and QHAdam
QHM and QHAdam are enhancements to the standard Momentum and Adam algorithms respectively. See the paper: Quasi-Hyperbolic Momentum and Adam for Deep Learning for more details. I have added these two algorithms to interface.
QHM adds a hyperparameter v
, also known as the discount factor
in addition to the momentum or decay
hyperparameter used by the momentum algorithm. QHM can be configured with these parameters to exactly implement several common gradient descent algorithms.
-
Momentum: When
v
is 1, QHM is identical to momentum. -
SGD: When
v
is 0 and 'decay' is 1, QHM is identical to the gradient descent algorithm. -
Nesterov: When
v
anddecay
are set to the same value, QHM is identical to Nesterov's accelerated gradient. It can also be used identically to Synthesized Nesterov Variants including the Robust Momentum method.
The basic Adam implementation and the one currently implemented in this code, prior to this PR, is biased towards zero. The Gradient Descent Wikipedia article referenced in the source code specifies a bias correction. I have added a hyperparameter to the UI to allow the user to enable bias correction.
QHAdam adds two hyperparamters v1
, also known as the discount factor
and v2
, also known as the squared discount factor
to the two Adam hyperparameters, beta1
and beta2
, plus the bias correction
option. QHAdam can be configured with these parameters to exactly implement several gradient descent algorithms.
-
Adam: When
v1
andv2
are both 1, QHAdam is identical to Adam. -
RMSProp: When
v1
is 0,v2
is 1 andbias correction
is disabled, QHAdam is identical to RMSProp. -
NAdam: When
v1
andbeta1
have the same value andv2
is 1, QHAdam is identical to NAdam.