variance_reduced_neural_networks
variance_reduced_neural_networks copied to clipboard
Why is the norm of the grad not used at all in the optimisation process of SAGA?
Where exactly is the equation (3) in the main paper implemented in the SAGA algorithm?