gradient_descent_viz Add support for QHM and QHAdam

Add support for QHM and QHAdam

Open RussStringham opened this issue 7 months ago • 0 comments

QHM and QHAdam are enhancements to the standard Momentum and Adam algorithms respectively. See the paper: Quasi-Hyperbolic Momentum and Adam for Deep Learning for more details. I have added these two algorithms to interface.

QHM adds a hyperparameter v, also known as the discount factor in addition to the momentum or decay hyperparameter used by the momentum algorithm. QHM can be configured with these parameters to exactly implement several common gradient descent algorithms.

Momentum: When v is 1, QHM is identical to momentum.
SGD: When v is 0 and 'decay' is 1, QHM is identical to the gradient descent algorithm.
Nesterov: When v and decay are set to the same value, QHM is identical to Nesterov's accelerated gradient. It can also be used identically to Synthesized Nesterov Variants including the Robust Momentum method.

The basic Adam implementation and the one currently implemented in this code, prior to this PR, is biased towards zero. The Gradient Descent Wikipedia article referenced in the source code specifies a bias correction. I have added a hyperparameter to the UI to allow the user to enable bias correction.

QHAdam adds two hyperparamters v1, also known as the discount factor and v2, also known as the squared discount factor to the two Adam hyperparameters, beta1 and beta2, plus the bias correction option. QHAdam can be configured with these parameters to exactly implement several gradient descent algorithms.

Adam: When v1 and v2 are both 1, QHAdam is identical to Adam.
RMSProp: When v1 is 0, v2 is 1 and bias correction is disabled, QHAdam is identical to RMSProp.
NAdam: When v1 and beta1 have the same value and v2 is 1, QHAdam is identical to NAdam.

Jul 18 '24 16:07 RussStringham

gradient_descent_viz gradient_descent_viz copied to clipboard

Add support for QHM and QHAdam

gradient_descent_viz
gradient_descent_viz copied to clipboard