Support user-specified inverse Hessian initialization with L-BFGS
Currently LBFGS supports two different initializations of the inverse Hessian: the identity matrix (scaleinvH0=false) and the scalar matrix of Nocedal & Wright (scaleinvH0=true). But there are other initialization strategies. For example, https://link.springer.com/article/10.1007%2FBF01589113 gives a number of options and compares them on benchmarks.
It would be nice if LBFGS took a user-provided function for initializing invH0.
I agree. I'll be happy to review if anyone's up for it :)
Okay! I may open a PR in the coming weeks.
Alternatively, as noted in the code, the initial Hessian is a preconditioner. We could use the preconditioner machinery here, but the preconditioner is updated using only the latest position. If it was also updated using the latest gradient, then we would have enough here.
What if the preconditioner was updated instead with the latest optimization state? Then there'd be enough information, and the following works:
using Optim, LinearAlgebra
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x0 = [0.0, 0.0]
options = Optim.Options(; store_trace=true, extended_trace=true)
sol1 = optimize(f, x0, LBFGS(; scaleinvH0 = true), options)
P = Optim.InverseDiagonal(1.0)
function precondprep(P, state)
if state.pseudo_iteration > 1
dg = state.dg
P.diag = real(dot(state.dx, dg)) / sum(abs2, dg)
else
P.diag = one.(P.diag)
end
end
sol2 = optimize(f, x0, LBFGS(; P, precondprep, scaleinvH0=false), options)
@assert Optim.x_trace(sol1) == Optim.x_trace(sol2)
I imagine this could be useful for other optimizers as well. But is there any way to do this without it being a major breaking change?
@pkofod do you have any feedback to my previous comment?
Bump @pkofod, do you have feedback for the suggestion in https://github.com/JuliaNLSolvers/Optim.jl/issues/955#issuecomment-1030450835?