StochasticOptimization.jl icon indicating copy to clipboard operation
StochasticOptimization.jl copied to clipboard

Add SVRG parameter updater

Open ahwillia opened this issue 9 years ago • 3 comments

This is an interesting stochastic optimizer with some nice theoretical guarantees for convex problems. Would be interesting to compare to the others we have implemented already.

https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

ahwillia avatar Oct 08 '16 16:10 ahwillia

Just trying to make sense of the paper. It seems like they are proposing the gradient update for weight w[i] is:

θ[i] -= lr * (∇[i] - V)

where V is the difference: E(grad(mean(theta))) - mean(E(grad(theta)))

Do agree this is their method?

tbreloff avatar Oct 09 '16 15:10 tbreloff

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w)    
    # hold current parameters and params from last iter
    w = initialize_weights()
    w_prev = similar(w)

    # main loop
    for s = 1:iterations

        # store previous weights
        copy!(w_prev, w)

        mu = mean([ grad(w, target, output) for (target,output) in data ])

        for t = 1:epoch_length
            (target,output) = rand_sample(data)

            # calc gradients
            ∇w = grad(w, target, output)
            ∇w_prev = grad(w_prev, target, output)

            # update
            w -= learnrate*(∇w - ∇w_prev + mu) 
        end
    end
end

ahwillia avatar Oct 09 '16 17:10 ahwillia

Related? https://arxiv.org/abs/1604.07070v2

On Sunday, October 9, 2016, Alex Williams [email protected] wrote:

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w) # hold current parameters and params from last iter w = initialize_weights() w_prev = similar(w)

# main loop
for s = 1:iterations

    # store previous weights
    copy!(w_prev, w)

    mu = mean([ grad(w, target, output) for (target,output) in data ])

    for t = 1:epoch_length
        (target,output) = rand_sample(data)

        # calc gradients
        ∇w = grad(w, target, output)
        ∇w_prev = grad(w_prev, target, output)

        # update
        w -= learnrate*(∇w - ∇w_prev + mu)
    end
endend

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JuliaML/StochasticOptimization.jl/issues/7#issuecomment-252498928, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492m1DME0wh4582SkuI7PzRBz9YTs5ks5qySDbgaJpZM4KRw5v .

tbreloff avatar Oct 13 '16 11:10 tbreloff