Optimisers.jl L-BFGS algorithm request

trafficstars

Motivation and description

Can we implement L-BFGS? It's a quasi 2nd order method that can converge much faster, suitable for computationally intensive models with moderate number of parameters. I work in inverse design and topology optimization with differentiable simulation. L-BFGS is the go-to method here.

https://github.com/baggepinnen/FluxOptTools.jl has a partial implementation but it'd be nice to have it natively within FluxML

Thanks!

Possible Implementation

No response

Dec 09 '24 21:12 paulxshen

One implementation which makes few assumptions about the data/gradient format is https://github.com/Jutho/OptimKit.jl

However, the mismatch is that it wants to control when to call the function/model, while in Optimisers.jl you call it & the package just handles the update. That's true of all the L-BFGS implementations I know of. Am not an expert but I think this is largely to allow for linesearch, and it will typically call f(x) several times per accepted update of x? However, it seems OptimKit.jl's interface has no way to call f(x) rather than withgradient(f, x):

(objective function) is specified as a function fval, gval = fg(x) that returns both the function value and its gradient at a given point x. The function value fval is assumed to be a real number of some type T<:Real. Both x and the gradient gval can be of any type, including tuples and named tuples.

I guess that's not impossible within the current interface... sometimes update!(state, model, grad) will be a linesearch step. Will it be a problem to sometimes stop not when e.g. OptimKit.optimize thinks you should, but just after 1000 calls? Will it be a problem that each call is typically on a different minibatch of data? That's not obligatory with this package but it is typical.

Dec 10 '24 18:12 mcabbott

Thanks I found updated implementation in Optim.jl https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/multivariate/solvers/first_order/l_bfgs.jl Doesn't look too hard to port to Optimisers.jl? We can omit linesearch. Flux assumes each function evaluation is expensive.

Dec 15 '24 05:12 paulxshen

Optimisers.jl Optimisers.jl copied to clipboard

L-BFGS algorithm request

Motivation and description

Possible Implementation

Optimisers.jl
Optimisers.jl copied to clipboard