Optimisers.jl
Optimisers.jl copied to clipboard
L-BFGS algorithm request
Motivation and description
Can we implement L-BFGS? It's a quasi 2nd order method that can converge much faster, suitable for computationally intensive models with moderate number of parameters. I work in inverse design and topology optimization with differentiable simulation. L-BFGS is the go-to method here.
https://github.com/baggepinnen/FluxOptTools.jl has a partial implementation but it'd be nice to have it natively within FluxML
Thanks!
Possible Implementation
No response
One implementation which makes few assumptions about the data/gradient format is https://github.com/Jutho/OptimKit.jl
However, the mismatch is that it wants to control when to call the function/model, while in Optimisers.jl you call it & the package just handles the update. That's true of all the L-BFGS implementations I know of. Am not an expert but I think this is largely to allow for linesearch, and it will typically call f(x) several times per accepted update of x? However, it seems OptimKit.jl's interface has no way to call f(x) rather than withgradient(f, x):
(objective function) is specified as a function
fval, gval = fg(x)that returns both the function value and its gradient at a given pointx. The function valuefvalis assumed to be a real number of some typeT<:Real. Bothxand the gradientgvalcan be of any type, including tuples and named tuples.
I guess that's not impossible within the current interface... sometimes update!(state, model, grad) will be a linesearch step. Will it be a problem to sometimes stop not when e.g. OptimKit.optimize thinks you should, but just after 1000 calls? Will it be a problem that each call is typically on a different minibatch of data? That's not obligatory with this package but it is typical.
Thanks I found updated implementation in Optim.jl https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/multivariate/solvers/first_order/l_bfgs.jl Doesn't look too hard to port to Optimisers.jl? We can omit linesearch. Flux assumes each function evaluation is expensive.