StochasticOptimization.jl Add SVRG parameter updater

This is an interesting stochastic optimizer with some nice theoretical guarantees for convex problems. Would be interesting to compare to the others we have implemented already.

https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

Oct 08 '16 16:10 ahwillia

Just trying to make sense of the paper. It seems like they are proposing the gradient update for weight w[i] is:

θ[i] -= lr * (∇[i] - V)

where V is the difference: E(grad(mean(theta))) - mean(E(grad(theta)))

Do agree this is their method?

Oct 09 '16 15:10 tbreloff

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w)    
    # hold current parameters and params from last iter
    w = initialize_weights()
    w_prev = similar(w)

    # main loop
    for s = 1:iterations

        # store previous weights
        copy!(w_prev, w)

        mu = mean([ grad(w, target, output) for (target,output) in data ])

        for t = 1:epoch_length
            (target,output) = rand_sample(data)

            # calc gradients
            ∇w = grad(w, target, output)
            ∇w_prev = grad(w_prev, target, output)

            # update
            w -= learnrate*(∇w - ∇w_prev + mu) 
        end
    end
end

Oct 09 '16 17:10 ahwillia

Related? https://arxiv.org/abs/1604.07070v2

On Sunday, October 9, 2016, Alex Williams [email protected] wrote:

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w) # hold current parameters and params from last iter w = initialize_weights() w_prev = similar(w)
# main loop
for s = 1:iterations

    # store previous weights
    copy!(w_prev, w)

    mu = mean([ grad(w, target, output) for (target,output) in data ])

    for t = 1:epoch_length
        (target,output) = rand_sample(data)

        # calc gradients
        ∇w = grad(w, target, output)
        ∇w_prev = grad(w_prev, target, output)

        # update
        w -= learnrate*(∇w - ∇w_prev + mu)
    end
endend
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaML/StochasticOptimization.jl/issues/7#issuecomment-252498928, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492m1DME0wh4582SkuI7PzRBz9YTs5ks5qySDbgaJpZM4KRw5v .

Oct 13 '16 11:10 tbreloff

StochasticOptimization.jl StochasticOptimization.jl copied to clipboard

Add SVRG parameter updater

StochasticOptimization.jl
StochasticOptimization.jl copied to clipboard