Optim.jl LBFGS moves away from minimum?!

Thanks for the package. I am using Optim's LBFGS algorithm with autodiff = :forward to minimize a somewhat high-dimension (120 parameters) objective function. For the first few hundred iterations, the algorithm executes as expected: the function value goes down, gradient size is generally decreasing, and iterations consistently take 40-80 seconds/iteration.

But then, at some point two things change:

The function value begins to increase (i.e., the algorithm is no longer moving toward a local minimum)
Iteration execution length increases dramatically (to about 1200 seconds/iteration).

I'm not sure how to create a MWE because I'm not sure where to look for what could cause this unusual behavior. I've tried going through the source code of the package to figure out, but that hasn't helped. I would appreciate any help. Thanks!

Dec 29 '21 20:12 cseveren

Useful reference https://discourse.julialang.org/t/gradient-rise-was-obtained-by-optim-jl-package-optimization/69548/11

Dec 29 '21 20:12 cseveren

This is even with the default option for these solved of allow_f_increases = false. When instead set allow_f_increases = true, the divergence away from the local minimum and much longer evaluations times happen much sooner. What explains the much longer function iteration times?

Dec 29 '21 22:12 cseveren

When you say longer iterations, is:

Your function being called more times per iteration of L-BFGS?
Your function's execution taking longer?

Dec 30 '21 15:12 johnmyleswhite

Thanks John -- Time for each iteration goes from ~40 secs to ~1200 secs. I imagine it's 1. and not 2., but I'm not sure how to differentiate? I have show_trace set to true, showing every iteration, but I don't think it reports the number function evaluations. If you know of a way to get it to report that, I'll try it and report back.

The key thing seems to be: really long iteration time <-> increases in objective function.

Dec 30 '21 16:12 cseveren

I'd expect it to be a long time spent in the line search. Might be noticed by a very small or very large step (if you use a line search that tries to satisfy the (approximate) wolfe conditions). Can you show an example output? You can also try to print the current state from the objective.

Jan 02 '22 19:01 pkofod

Thanks much for the help. Here's the output from the REPL showing the trace each iteration, running with allow_f_increases = true and @time for time-keeping, starting from an initial point that was the last candidate vector in a prior run that reached maximum iterations. The fval decreases through iteration 11, then increases by a small amount along with a much longer time for the next iteration, then decreases for two iterations, then goes into a slow and long increasing period (which lasts quite a few iterations, I'm only providing 17 iterations for brevity).

Note that the dimension of the problem is about 100 parameters, and each function evaluation (timed separately using @btime outside of Optim.jl takes about 0.2-0.5 seconds -- I've tried and don't think I can further optimize objective function performance).

Iter     Function value   Gradient norm 
     0     6.455021e-05     1.341821e-05
 * time: 0.021000146865844727
     1     6.455012e-05     6.209657e-06
 * time: 41.98800015449524
     2     6.455002e-05     2.262688e-06
 * time: 125.80299997329712
     3     6.454674e-05     3.180252e-05
 * time: 231.52100014686584
     4     6.454620e-05     4.935921e-06
 * time: 295.67400002479553
     5     6.454519e-05     2.444610e-05
 * time: 402.53600001335144
     6     6.454108e-05     4.777337e-06
 * time: 488.38899993896484
     7     6.454084e-05     2.226425e-05
 * time: 572.654000043869
     8     6.452795e-05     1.646926e-05
 * time: 679.8870000839233
     9     6.452635e-05     5.264460e-06
 * time: 744.5700001716614
    10     6.452409e-05     2.739239e-05
 * time: 829.8760001659393
    11     6.452352e-05     3.085255e-06
 * time: 893.9900000095367
    12     6.452359e-05     3.399898e-06
 * time: 2109.0920000076294
    13     6.452329e-05     2.607508e-05
 * time: 2173.4049999713898
    14     6.451462e-05     9.550695e-06
 * time: 2260.1820001602173
    15     6.451469e-05     3.796692e-06
 * time: 3365.085000038147
    16     6.451475e-05     4.008317e-06
 * time: 4544.650000095367
    17     6.451482e-05     1.260894e-06
 * time: 5725.456000089645

If this is indeed due to a slow line search, is there any way to improve performance?

How do I print the current state from the objective function? Do you mean like including a println() in the objective function body?

Jan 03 '22 17:01 cseveren

Have you tried using extended_trace = true to get more information?

I also think it would be helpful to confirm the function call count. You don't need to change the objective function per se; you can create a wrapper like this example of tracking a total count: https://discourse.julialang.org/t/macro-for-counting-the-number-of-times-a-function-is-called/3129/3

But adding a println inside the body works as well and might be easier.

Jan 03 '22 18:01 johnmyleswhite

Using the same inputs as the prior example, I started the optimization routine twice, stopping once with maxiter=11 and once with maxiter=12 hoping to capture a big difference f(x) calls at exit (I couldn't get the wrapper approach to work, see bottom). It is indeed a much higher number of functions evals. This may be out of scope, but do you have any suggestions for improving performance (i.e., spending less time on line search, etc)?

For 11 iterations:

 * Work counters
    Seconds run:   878  (vs limit Inf)
    Iterations:    11
    f(x) calls:    43
    ∇f(x) calls:   43

And for 12:

 * Work counters
    Seconds run:   2094  (vs limit Inf)
    Iterations:    12
    f(x) calls:    100
    ∇f(x) calls:   100

Note I was unable to get the counting wrapper function to work, perhaps because of autodiff? I defined:

mutable struct CountingF{TF}
    f::TF
    counter::Int
end

CountingF(f) = CountingF(f, 0)

function (c::CountingF)(args...)
    c.counter += 1
    c.f(args...)
end

Then called

td = TwiceDifferentiable(vars -> CountingF(critEvalSmooth_Wald(vars, Xmat, w_t, 0.1, 400, d_MatT, s_MatT400, nobs, n_d, n_x, n_z1, n_z2, θ1t, θ2t, W_mat)), vrun_002_1; autodiff = :forward)
@time res_trial1 = optimize(td, vrun_002_1, method = LBFGS(), iterations = 2, show_trace = true, show_every=1, x_tol=1e-8, f_tol=1e-8)

This threw the following error:

ERROR: LoadError: MethodError: no method matching zero(::CountingF{ForwardDiff.Dual{ForwardDiff.Tag{var"#19#20", Float64}, Float64, 12}})

Jan 04 '22 00:01 cseveren

please try with extended_trace=true as John suggested.

There must be a bug if this happens with allow_f_increases=false.

Feb 04 '22 09:02 pkofod

It seems that allow_f_increases is not set to false in default for every chosen algorithm, at least not for L-BFGS. I encountered the same issue (in Optim v1.7.2) with the objective function's value going up after several steps, even though I did not specify allow_f_increases, assuming it was set to false as mentioned in the documentation. However, the problem was resolved after I manually added the keyword argument allow_f_increases=false in Optim.Options. I wonder if there's some bug regarding the default values of allow_f_increases. One thing to be noticed is that I also imposed a box constraint using Optim.Fminbox.

Sep 14 '22 01:09 frankwswang

Optim.jl Optim.jl copied to clipboard

LBFGS moves away from minimum?!

Optim.jl
Optim.jl copied to clipboard