ProximalAlgorithms.jl Auto increase gamma

Currently, the step size gamma parameter can only decrease, eg at https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl/blob/dc0342ca6fa3f6730168b3dcad9a1bd64187d875/src/algorithms/fast_forward_backward.jl#L92 However, sometimes larger step sizes can be performed at some point during iteration. I tried doing state.gamma *= 1.02 in the iteration loop, and seem to get ~2x faster convergence on some problems. Gamma vs iterations plot looks like this:

Does this modification make sense in general? For some algorithms? Should it be included into the package implementation itself?

May 14 '23 11:05 aplavin

@aplavin I think this makes sense to include as an option pretty much anywhere that backtracking routine is adopted. Backtracking is used to make sure gamma is small enough wrt the local geometry of the objective, to guarantee some sufficient decrease condition hence convergence: as long as it runs at every iteration, having it start from the previous gamma or something larger will not make a difference for convergence (rates).

Do you want to open a PR?

It would have an associated coefficient in the algorithm (your 1.02) to allow configuration: just like the backtracking coefficient, I’m not sure there’s some a priori optimal setting for this. Too large an increase will trigger backtracking more often, which has a cost.

For a proximal gradient iteration with fully adaptive stepsize, and no backtracking logic, you can check out this paper https://arxiv.org/abs/2301.04431 with associated code https://github.com/pylat/adaptive-proximal-algorithms. We found this to often outperform Nesterov extrapolation, and requires no setting, although it doesn’t come with the same convergence rate guarantees. (Haven’t found the time to include this in ProximalAlgorithms.)

May 14 '23 13:05 lostella

Thanks for confirming that the modification makes sense! I'm not closely familiar with proximal algorithms, just saw this trick used somewhere.Also I noticed some algorithms (eg PANOC) do a state reset when gamma changes. Would this modification play well with those?However I didn't find any of the other algorithms to perform better than the accelerated gradient (FastForwardBackward) in cursory testing on my problems.Definitely looking forward for trying the new algorithm you linked!Message ID: @.***>

May 14 '23 19:05 aplavin