Optim.jl [FEATURE REQUEST] LBFGSB via LBFGSB.jl

There is already a functioning Julia wrapper for LBFGSB: https://github.com/Gnimuc/LBFGSB.jl, which is the basis of the lbfgsb.f used in scipy.optimize. Note that scipy.optimize uses a modified version of LBFGSB based on this 2011 paper:

c         Jorge Nocedal and Jose Luis Morales, Remark on "Algorithm 778: 
c         L-BFGS-B: Fortran Subroutines for Large-Scale Bound Constrained 
c         Optimization"  (2011). To appear in  ACM Transactions on 
c         Mathematical Software

See #923.

It would be great to have an Optim.jl interface for LBFGSB.jl.

Jun 09 '21 07:06 rht

I think I will do this myself since @pkofod is busy.

Jun 09 '21 07:06 rht

I think I will do this myself since @pkofod is busy.

Are you thinking about a package like discussed in the other issue?

Jun 09 '21 08:06 pkofod

Yes, I will do it as OptimLBFGSB.jl.

Jun 09 '21 09:06 rht

I can start a repo here that you can PR against, or we can transfer it after?

Jun 09 '21 10:06 pkofod

We can transfer it after.

Jun 09 '21 10:06 rht

I'd love to see this for use in GRAPE (our original GRAPE Fortran code is using L-BFGS-B as the backend).

Conceptually, though, it looks like the LBFGS method currently in Optim can do constraints (although I haven't tried out that feature yet). Would LBFGSB be expected to behave very differently when enforcing the bounds?

Oct 17 '21 06:10 goerz

@rht not certain if you made an OptimLBFGSB.jl, but I implemented one here: https://github.com/sethaxen/OptimLBFGSB.jl.

Feb 16 '24 16:02 sethaxen

I ended up using https://github.com/Gnimuc/LBFGSB.jl/issues/11, because I wanted the result to be identical to SciPy's L-BFGSB.

Feb 16 '24 18:02 rht

The problem with "wrapping" it in Optim IMO would be that we can accommodate a similar interface on the calling side of things, but it's harder to wrap the results to be accessed in the same way. But we could try with an extension?

Feb 18 '24 11:02 pkofod

harder to wrap the results to be accessed in the same way.

What do you mean by this?

Feb 18 '24 12:02 sethaxen

harder to wrap the results to be accessed in the same way.

What do you mean by this?

The output of Optim has a specific format. I'm not sure that it is possible to construct the complete "results" type from what we get from LBFGSB and the trace printing and storage will also be different. But we could try.

Feb 19 '24 07:02 pkofod

The output of Optim has a specific format. I'm not sure that it is possible to construct the complete "results" type from what we get from LBFGSB and the trace printing and storage will also be different. But we could try.

Ah, I see what you mean. For OptimLBFGSB.jl I didn't wrap LBFGSB.jl. Instead I wrapped L_BFGS_B_jll. L-BFGS-B offers an iterative API that LBFGSB.jl only uses in an internal function, so we can't use it. The iterative API fits Optim.jl well, so long as we use Optim's own convergence checks (L-BFGS-B has different ones). L-BFGS-B is just responsible then for the line search, two-loop recursion, and constraint handling.

Here's an example:

julia> using Optim, OptimLBFGSB

julia> f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2;

julia> x0 = [0.0, 0.0];

julia> options = Optim.Options(; store_trace=true, extended_trace=true);

julia> sol = optimize(f, x0, LBFGSB(), options)
 * Status: success

 * Candidate solution
    Final objective value:     4.089857e-17

 * Found with
    Algorithm:     LBFGSB{Nothing, Nothing}

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 2.35e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    23
    f(x) calls:    32
    ∇f(x) calls:   32

julia> Optim.x_trace(sol)
24-element Vector{Vector{Float64}}:
 [0.0, 0.0]
 [0.17485520507589397, 0.0]
 [0.2611363404175081, 0.04906635389155636]
 [0.3239441412158067, 0.11056436224568814]
 [0.4240470960987517, 0.16523903911939064]
 ⋮
 [0.9999850618932064, 0.9999669398908719]
 [0.9999996981562516, 0.9999993829747845]
 [0.9999999936276822, 0.9999999872013148]
 [0.9999999936276822, 0.9999999872013148]

If one wants to wrap L-BFGS-B using its own convergence checks, creating an Optimization.jl wrapper might make more sense, since it's designed to wrap optimization packages without controlling their behavior.

The main complications I ran into in implementation are:

L-BFGS-B is a constrained optimizer, but Optim types distinguish between AbstractConstrainedOptimizer and FirstOrderOptimizer. I went with FirstOrderOptimizer.
To support Optim's API for specifying bounds, we would want something like

function Optim.optimize(
    df::NLSolversBase.OnceDifferentiable,
    l::Union{Number,AbstractVector},
    u::Union{Number,AbstractVector},
    initial_x::AbstractVector,
    F::Optim.Fminbox{LBFGSB{Nothing,Nothing}},  # no bounds specified
    options::Optim.Options = Optim.Options()
)
    optimizer = LFBSGB(; m=F.method.m, lower=l, upper=u)
    return Optim.optimize(df, initial_x, optimizer, options)
end

I haven't tested this though and am not certain if it would catch all cases where a user specified bounds.

Optim expects its first-order optimizers have a manifold property and have states with x_previous, g_previous, and fx_previous properties but doesn't document these requirements.
One must implement Optim.trace! for a new solver, but this isn't documented, and this function does not seem to be part of the API.

Feb 19 '24 08:02 sethaxen

Oh, okay. Thank you for all the work you did looking into it. So you are relying directly on L_BFGS_B_jll. Okay, I'm thinking if it should I've in Optim proper or be a package extension. The reason is, that some people may expect any Optim optimizer to Julia to the core (it is in the tagline after all), but I can certainly see that it may be useful to wrap in the Optim interface if people are already in Optim and want to try LBFGSB. I actually don't think LBFGSB is super hard to code up, but it's just a bandwidth issue to be honest.

Feb 19 '24 11:02 pkofod

Oh, okay. Thank you for all the work you did looking into it.

No problem! Was actually pretty straightforward.

Okay, I'm thinking if it should I've in Optim proper or be a package extension. The reason is, that some people may expect any Optim optimizer to Julia to the core (it is in the tagline after all), but I can certainly see that it may be useful to wrap in the Optim interface if people are already in Optim and want to try LBFGSB. I actually don't think LBFGSB is super hard to code up, but it's just a bandwidth issue to be honest.

I agree it's not ideal that it's not pure Julia. So e.g. it won't support complex numbers or matrices like LBFGS does (although I guess this could be done with some work).

I implemented an algorithm from a paper that uses L-BFGS-B without constraints. When I compare results with LBFGS(; linesearch=MoreThuente()), which I expected to be the same as L-BFGS-B, LBFGSB() performs significantly better. I haven't been able to find a combination of parameters for LBFGS that produces the same traces as LBFGSB, so I assume they're doing something meaningfully different. I'd of course prefer a pure Julia implementation, so long as it can be made equivalent, but unfortunately I also don't have the bandwidth to dig into why they're different.

I guess my preference for now is for LBFGSB to live outside of Optim, but maybe be in the same org and linked to in the docs. An extension might be a little awkward, because loading it would require a user manually loading L_BFGS_B_jll. Then when someone is able to add a pure Julia LBFGSB to Optim, they could check against this implementation for consistency.

Feb 20 '24 08:02 sethaxen

@pkofod Any further thoughts on this? If we keep OptimLBFGSB.jl, then I would finish the tests and register it.

Feb 26 '24 12:02 sethaxen

I'll go ahead and write the tests and register OptimLBFGSB.jl.

Mar 03 '24 07:03 sethaxen