Optim.jl
Optim.jl copied to clipboard
LBFGS moves away from minimum?!
Thanks for the package. I am using Optim
's LBFGS
algorithm with autodiff = :forward
to minimize a somewhat high-dimension (120 parameters) objective function. For the first few hundred iterations, the algorithm executes as expected: the function value goes down, gradient size is generally decreasing, and iterations consistently take 40-80 seconds/iteration.
But then, at some point two things change:
- The function value begins to increase (i.e., the algorithm is no longer moving toward a local minimum)
- Iteration execution length increases dramatically (to about 1200 seconds/iteration).
I'm not sure how to create a MWE because I'm not sure where to look for what could cause this unusual behavior. I've tried going through the source code of the package to figure out, but that hasn't helped. I would appreciate any help. Thanks!
Useful reference https://discourse.julialang.org/t/gradient-rise-was-obtained-by-optim-jl-package-optimization/69548/11
This is even with the default option for these solved of allow_f_increases = false
. When instead set allow_f_increases = true
, the divergence away from the local minimum and much longer evaluations times happen much sooner. What explains the much longer function iteration times?
When you say longer iterations, is:
- Your function being called more times per iteration of L-BFGS?
- Your function's execution taking longer?
Thanks John -- Time for each iteration goes from ~40 secs to ~1200 secs. I imagine it's 1. and not 2., but I'm not sure how to differentiate? I have show_trace
set to true, showing every iteration, but I don't think it reports the number function evaluations. If you know of a way to get it to report that, I'll try it and report back.
The key thing seems to be: really long iteration time <-> increases in objective function.
I'd expect it to be a long time spent in the line search. Might be noticed by a very small or very large step (if you use a line search that tries to satisfy the (approximate) wolfe conditions). Can you show an example output? You can also try to print the current state from the objective.
Thanks much for the help. Here's the output from the REPL showing the trace each iteration, running with allow_f_increases = true
and @time
for time-keeping, starting from an initial point that was the last candidate vector in a prior run that reached maximum iterations. The fval decreases through iteration 11, then increases by a small amount along with a much longer time for the next iteration, then decreases for two iterations, then goes into a slow and long increasing period (which lasts quite a few iterations, I'm only providing 17 iterations for brevity).
Note that the dimension of the problem is about 100 parameters, and each function evaluation (timed separately using @btime
outside of Optim.jl
takes about 0.2-0.5 seconds -- I've tried and don't think I can further optimize objective function performance).
Iter Function value Gradient norm
0 6.455021e-05 1.341821e-05
* time: 0.021000146865844727
1 6.455012e-05 6.209657e-06
* time: 41.98800015449524
2 6.455002e-05 2.262688e-06
* time: 125.80299997329712
3 6.454674e-05 3.180252e-05
* time: 231.52100014686584
4 6.454620e-05 4.935921e-06
* time: 295.67400002479553
5 6.454519e-05 2.444610e-05
* time: 402.53600001335144
6 6.454108e-05 4.777337e-06
* time: 488.38899993896484
7 6.454084e-05 2.226425e-05
* time: 572.654000043869
8 6.452795e-05 1.646926e-05
* time: 679.8870000839233
9 6.452635e-05 5.264460e-06
* time: 744.5700001716614
10 6.452409e-05 2.739239e-05
* time: 829.8760001659393
11 6.452352e-05 3.085255e-06
* time: 893.9900000095367
12 6.452359e-05 3.399898e-06
* time: 2109.0920000076294
13 6.452329e-05 2.607508e-05
* time: 2173.4049999713898
14 6.451462e-05 9.550695e-06
* time: 2260.1820001602173
15 6.451469e-05 3.796692e-06
* time: 3365.085000038147
16 6.451475e-05 4.008317e-06
* time: 4544.650000095367
17 6.451482e-05 1.260894e-06
* time: 5725.456000089645
If this is indeed due to a slow line search, is there any way to improve performance?
How do I print the current state from the objective function? Do you mean like including a println()
in the objective function body?
Have you tried using extended_trace = true
to get more information?
I also think it would be helpful to confirm the function call count. You don't need to change the objective function per se; you can create a wrapper like this example of tracking a total count: https://discourse.julialang.org/t/macro-for-counting-the-number-of-times-a-function-is-called/3129/3
But adding a println
inside the body works as well and might be easier.
Using the same inputs as the prior example, I started the optimization routine twice, stopping once with maxiter=11
and once with maxiter=12
hoping to capture a big difference f(x)
calls at exit (I couldn't get the wrapper approach to work, see bottom). It is indeed a much higher number of functions evals. This may be out of scope, but do you have any suggestions for improving performance (i.e., spending less time on line search, etc)?
For 11 iterations:
* Work counters
Seconds run: 878 (vs limit Inf)
Iterations: 11
f(x) calls: 43
∇f(x) calls: 43
And for 12:
* Work counters
Seconds run: 2094 (vs limit Inf)
Iterations: 12
f(x) calls: 100
∇f(x) calls: 100
Note I was unable to get the counting wrapper function to work, perhaps because of autodiff
? I defined:
mutable struct CountingF{TF}
f::TF
counter::Int
end
CountingF(f) = CountingF(f, 0)
function (c::CountingF)(args...)
c.counter += 1
c.f(args...)
end
Then called
td = TwiceDifferentiable(vars -> CountingF(critEvalSmooth_Wald(vars, Xmat, w_t, 0.1, 400, d_MatT, s_MatT400, nobs, n_d, n_x, n_z1, n_z2, θ1t, θ2t, W_mat)), vrun_002_1; autodiff = :forward)
@time res_trial1 = optimize(td, vrun_002_1, method = LBFGS(), iterations = 2, show_trace = true, show_every=1, x_tol=1e-8, f_tol=1e-8)
This threw the following error:
ERROR: LoadError: MethodError: no method matching zero(::CountingF{ForwardDiff.Dual{ForwardDiff.Tag{var"#19#20", Float64}, Float64, 12}})
please try with extended_trace=true
as John suggested.
There must be a bug if this happens with allow_f_increases=false
.
It seems that allow_f_increases
is not set to false
in default for every chosen algorithm, at least not for L-BFGS. I encountered the same issue (in Optim v1.7.2
) with the objective function's value going up after several steps, even though I did not specify allow_f_increases
, assuming it was set to false
as mentioned in the documentation. However, the problem was resolved after I manually added the keyword argument allow_f_increases=false
in Optim.Options
. I wonder if there's some bug regarding the default values of allow_f_increases
. One thing to be noticed is that I also imposed a box constraint using Optim.Fminbox
.