StochasticOptimization.jl
StochasticOptimization.jl copied to clipboard
Add SVRG parameter updater
This is an interesting stochastic optimizer with some nice theoretical guarantees for convex problems. Would be interesting to compare to the others we have implemented already.
https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf
Just trying to make sense of the paper. It seems like they are proposing the gradient update for weight w[i] is:
θ[i] -= lr * (∇[i] - V)
where V is the difference: E(grad(mean(theta))) - mean(E(grad(theta)))
Do agree this is their method?
This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).
I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)
function svrg_psuedocode(data, w)
# hold current parameters and params from last iter
w = initialize_weights()
w_prev = similar(w)
# main loop
for s = 1:iterations
# store previous weights
copy!(w_prev, w)
mu = mean([ grad(w, target, output) for (target,output) in data ])
for t = 1:epoch_length
(target,output) = rand_sample(data)
# calc gradients
∇w = grad(w, target, output)
∇w_prev = grad(w_prev, target, output)
# update
w -= learnrate*(∇w - ∇w_prev + mu)
end
end
end
Related? https://arxiv.org/abs/1604.07070v2
On Sunday, October 9, 2016, Alex Williams [email protected] wrote:
This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).
I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)
function svrg_psuedocode(data, w) # hold current parameters and params from last iter w = initialize_weights() w_prev = similar(w)
# main loop for s = 1:iterations # store previous weights copy!(w_prev, w) mu = mean([ grad(w, target, output) for (target,output) in data ]) for t = 1:epoch_length (target,output) = rand_sample(data) # calc gradients ∇w = grad(w, target, output) ∇w_prev = grad(w_prev, target, output) # update w -= learnrate*(∇w - ∇w_prev + mu) end endend— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaML/StochasticOptimization.jl/issues/7#issuecomment-252498928, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492m1DME0wh4582SkuI7PzRBz9YTs5ks5qySDbgaJpZM4KRw5v .