NLsolve.jl problems with trust region method

I have been using this package extensively for solving collocation problems in economics. In my previous experience (with other code), trust region methods have almost always dominated quasi-Newton methods, and definitely Newton methods. This is in accordance with what the Nocedal-Wright book says.

However, with NLsolve, trust region methods frequently don't converge, while plain vanilla Newton converges very well. An MWE is

using NLsolve

"Freudenstein and Roth function."
function f!(x, fval)
    fval[1] = -13 + x[1] + ((5-x[2])*x[2]-2)*x[2]
    fval[2] = -29 + x[1] + ((x[2] + 1)*x[2]-14)*x[2]
end

o1 = nlsolve(f!, [0.5, -2]; autodiff = true)
NLsolve.converged(o1)           # false
o1.zero                         # "bad root"  11.4128, -0.896805

o2 = nlsolve(f!, [0.5, -2]; autodiff = true, method = :newton)
NLsolve.converged(o2)           # true
o2.zero                         # root 5, 4

I don't think there is anything with trust region as a method in theory, so I am wondering if there is a bug in the implementation.

I would like to help out. I could:

factor out parts of the source and write tests for them (exact references of which method is implemented would help though),
code up a library of well-known test cases, both for optimization and rootfinding (this would help Optim.jl too). I am not aware of an existing library/collection like this for Julia.

Apr 02 '17 08:04 tpapp

Hey @tpapp thanks for opening this issue. It sounds like we have similar use cases for this library :)

One thing that we (@pkofod and I) have talked about is breaking out the tests in this package in to another package we can require when running tests. This would follow in the style of what Optim.jl has done with the OptimTests.jl package. I think we have a decent start here as we have the MINPACK test suite plus a number of other problems.

Personally, I have never implemented the trust region method so I wouldn't be able to speak to whether or not there are potential coding errors in this implementation. Do you know of a reference that would be best to look at for understanding how trust region methods work?

Apr 05 '17 14:04 sglyon

Nocedal and wright is a good starting point. Let us hunt :bug:s

Apr 05 '17 15:04 pkofod

Let me add something. I invited nlsolve to JuliaNLSolvers because I really want to see some development going into the package. We've had great success increasing productivity and collaboration with just a few new contributors in Optim, so I'd be thrilled to have more people looking at NLsolve than is currently the case.

Also, I agree that the trust region method usually works very well for the problems I work with (I think my research interests align quite nicely with yours) so let's find out what problem is.

Apr 05 '17 15:04 pkofod

IMO the best starting point for debugging would be factoring the code into smaller functions that one can test separately (dogleg, etc).

In the long run, a package with nothing but test functions would be nice, with the understanding if f(x)=0 is used for testing nonlinear solvers, min ||f(x)|| can be used to test he optimizer. A nice paper with lots of functions is

@article{more1981testing,
  title={Testing unconstrained optimization software},
  author={Mor{\'e}, Jorge J and Garbow, Burton S and Hillstrom, Kenneth E},
  journal={ACM Transactions on Mathematical Software (TOMS)},
  volume={7},
  number={1},
  pages={17--41},
  year={1981},
  publisher={ACM}
}

Apr 05 '17 17:04 tpapp

Hi All, this might be loosely related to the issue. Thought it might be worth mentioning an outside use case from robotics, which makes extensive use of the trust region method also. Around 3-15 dimensional on fairly well behaved, but not convex, functions, and with possibly weird initial conditions. I have some fair quality assurance testing around aggregate output from batch calls for a variety of cases.

In the last year or two, I've seen a few non convergence issues due to non-start or trivial cases. Usually a small bump to the estimate x+=1e-10*randn(n) solves my non-convergence cases. I had wondered if there might be an untested corner case somewhere. For example when x0=zeros(n) I've seen it get stuck.

For completeness, I've also been using gradient free methods from Optim.jl in some setups minimization rather than root finding cases. Thereby carrying two dependencies. Thanks!

Apr 05 '17 22:04 dehann

I have a problem where the trust region method is going crazy. It's a non-stochastic problem, with the same initial conditions, but I can get wildly different values out each time. For example, the residuals:

[8.86293e-5,1.19202e-5,0.000330256,-919.125,32.0626,-323.151]
[-1.18138e-5,-4.73636e-6,-1.32164e-5,-5464.3,190.616,-1921.17]
[-0.0008622,-0.000345666,-0.000964572,-919.142,32.0623,-323.158]
[0.00180817,0.000712957,0.00200938,0.0078686,0.00149706,0.0045823]
[0.00345202,0.00138402,0.00386197,-919.143,32.0666,-323.154]
[-0.000480206,-0.000192523,-0.000537226,0.009301,-0.000797501,0.00278132]
[0.0,0.0,0.0,-0.0126197,0.000440224,-0.00443692]
[0.000837957,0.000286074,0.000861251,-0.00115442,0.000712696,0.00031013]
[1.37184e-6,5.50121e-7,1.53482e-6,-4.98481e-5,3.09059e-6,-1.61291e-5]
[0.0040354,0.00159477,0.00451992,-5406.55,188.605,-1900.86]
[1.61864e-6,6.48841e-7,1.81049e-6,1.27032e-5,1.15111e-6,6.11227e-6]
[-0.00124305,0.00119902,0.00218492,0.000610128,0.00150093,0.0043943]
[0.0,0.0,0.0,0.0331242,-0.0011555,0.011646]
[-0.00289402,0.000288416,-0.00125191,24808.6,-865.419,8722.35]
[0.0,0.0,0.0,0.033643,-0.0011736,0.0118284]
[0.000579008,0.000202527,0.000630362,-755.516,26.3558,-265.628]
[3.71598e-7,7.17875e-7,7.72998e-8,-8.85129e-6,2.00432e-6,-1.73855e-6]

For now I put it in a loop and just let it run repeatedly until the residual is low enough

save_sol = 0
while true
  bvp = BVProblem(f, domin, cur_bc!, init)
  resid_f = Array(Float64, 6)
  sol = solve(bvp, Shooting(Tsit5(),nlsolve=(f,u0) -> NLsolve.nlsolve(f,u0,autodiff=true)),force_dtmin=true,abstol=1e-10)
  cur_bc!(resid_f,sol)
  println(resid_f)
  if maximum(abs.(resid_f)) < 1e-5
    save_sol = sol
    break
  end
end

but that's just a hack to get around the fact that it's acting very non-deterministic and diverges randomly. (Using finite differencing is even worse on this problem!)

Apr 17 '17 20:04 ChrisRackauckas

Since it's the trust region solver, we can at least rule out, that this is a problem with the changes in LineSearches. I hope to have time to have a look soon.

Apr 17 '17 20:04 pkofod

Are things getting initialized correctly?

Apr 17 '17 20:04 KristofferC

In the solver I mean

Apr 17 '17 20:04 KristofferC

Are things getting initialized correctly?

I havn't checked, but I don't think so. I saw this exact same problem with some of the methods in Optim a while back, and there some arrays were assumed (in the code) to be initialized as zeros(n), but was actually initialized using Array{T,1}(n) constructors. This lead to unpredictable behaviour. Very often they would have very small elements, but sometimes it would contain large enough values to throw off the results.

Apr 17 '17 20:04 pkofod

I cannot reproduce my original problematic example anymore.

Oct 26 '17 18:10 ChrisRackauckas

Revisiting this issue, I can still reproduce the problem (updated to the new calling conventions):

using NLsolve

"Freudenstein and Roth function."
function f!(fval, x)
    fval[1] = -13 + x[1] + ((5-x[2])*x[2]-2)*x[2]
    fval[2] = -29 + x[1] + ((x[2] + 1)*x[2]-14)*x[2]
end

o1 = nlsolve(f!, [0.5, -2]; autodiff = :forward)
NLsolve.converged(o1)           # false
o1.zero                         # "bad root"  11.4128, -0.896805

o2 = nlsolve(f!, [0.5, -2]; autodiff = :forward, method = :newton)
NLsolve.converged(o2)           # true
o2.zero                         # root 5, 4

I am happy to dig into this, but suggestions about where to start would be appreciated.

May 14 '19 08:05 tpapp

I'd never tried showing the trace, but if you do, youll notice that at some point it doesn't take a positive step at all. The step norm is 0.0. So yeah, there's a bug somewhere, and it has to be related to calculation of the search direction.

May 14 '19 13:05 pkofod

Okay, so I've been looking into approaching a new version of these packages, and looking at the code for the trust region newton's method here there are some issues. First, the dogleg is implemented in a way where the actual dogleg step (if neither the Cauchy point is taken nor the Newton step is interior to the trust region) is taken, and, second, the trust region update seems completely strange (and, as far as I can see, very wrong).

Oct 23 '19 21:10 pkofod