Optim.jl icon indicating copy to clipboard operation
Optim.jl copied to clipboard

`NewtonTrustRegion` algorithm deviates from reference

Open devmotion opened this issue 1 month ago • 1 comments

When looking through the code of NewtonTrustRegion, I noticed that the implementation seems to differ a bit from the algorithm 4.1 in N&W that it refers to. Is there an explanation for these differences?

  • The code handles special cases of the predicted reduction (-m) separately in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L353-L365 whereas algorithm 4.1 is only based on rho = actual_reduction / predicted_reduction = f_x_diff / (-m). N&W also explain why that should be sufficient.

  • N&W propose to decrease the trust region radius delta as implemented in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L368 if the ration rho of actual and predicted reduction is below a pre-set threshold rho_min (fixed to 0.25 by N&W, configurable in Optim). However, in case the step is rejected - which happens if rho <= eta < rho_min, so in a subset of cases where the trust region radius was already reduced - the code in Optim reduces the step size even further in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L382 based on the difference in Euclidean norm between the rejected and the previous state (which should always be <= delta, hence this is a more drastic reduction).

devmotion avatar Nov 27 '25 14:11 devmotion

For the second point, N&W propose a quite simple algorithm and they do not really highlight all aspects about the radius update. They focus a lot on the version of the subproblem solver used in Optim that solves the Eigenproblem explicitly, but that is actually a quite inefficient way. In NLSolvers I mostly followed Conn, Gould and Toint's textbook whose basic trust region update is not too different from in Optim, but there are several extensions and modifications. I also followed Yuan [1999]: A Review of Trust Region Algorithms for Optimization . This paper summarizes some of this, but is not focused on practicalities. CGT book focuses a lot on theory and also a lot of practical considerations for example Section 10.5.2: Basing the Radius Update on the Steplength where they write

Image

There's more of course, but reading the comment that was in Optim from the original implementation, I think that it may have been motivated by this section or a paper that covers the same idea - though I did not write that code. The idea is quite old and they cite Powell 1970: A new algorithm for unconstrained optimization and the implementation of the paper in 1970: A Fortran subroutine for unconstrained minimization requiring first derivatives of the objective function in the Notes and references, but I have not looked that paper up yet. They cite other papers from 1975 up to the 90s with similar ideas of using the step-length to quickly reduce the radius to something smaller than ||s|| if s is interior and is rejected to avoid unnecessary computations.

Reducing it by 75% is maybe too much, but I think it is generally not a thing that we hope occurs here, because unlike the general treatment of CTG and Yuan, the model is based on the Hessian, not potentially some approximation. So if you actually have a solution (based on the real hessian) that is interior but not accepted, then you objective is probably not too well-behaved (in the sense that it's not "locally" quadratic all the way out to x+s) and you may want to quickly reduce the radius to make sure that the model function and objective function align well such that the predicted reduction matches the realized reduction.

pkofod avatar Nov 27 '25 21:11 pkofod