`NewtonTrustRegion` algorithm deviates from reference
When looking through the code of NewtonTrustRegion, I noticed that the implementation seems to differ a bit from the algorithm 4.1 in N&W that it refers to. Is there an explanation for these differences?
-
The code handles special cases of the predicted reduction (
-m) separately in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L353-L365 whereas algorithm 4.1 is only based onrho = actual_reduction / predicted_reduction = f_x_diff / (-m). N&W also explain why that should be sufficient. -
N&W propose to decrease the trust region radius
deltaas implemented in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L368 if the rationrhoof actual and predicted reduction is below a pre-set thresholdrho_min(fixed to 0.25 by N&W, configurable in Optim). However, in case the step is rejected - which happens ifrho <= eta < rho_min, so in a subset of cases where the trust region radius was already reduced - the code in Optim reduces the step size even further in https://github.com/JuliaNLSolvers/Optim.jl/blob/f9fe22206c6cdbc1a5ecf4e082b08a22fe0448ee/src/multivariate/solvers/second_order/newton_trust_region.jl#L382 based on the difference in Euclidean norm between the rejected and the previous state (which should always be<= delta, hence this is a more drastic reduction).
For the second point, N&W propose a quite simple algorithm and they do not really highlight all aspects about the radius update. They focus a lot on the version of the subproblem solver used in Optim that solves the Eigenproblem explicitly, but that is actually a quite inefficient way. In NLSolvers I mostly followed Conn, Gould and Toint's textbook whose basic trust region update is not too different from in Optim, but there are several extensions and modifications. I also followed Yuan [1999]: A Review of Trust Region Algorithms for Optimization . This paper summarizes some of this, but is not focused on practicalities. CGT book focuses a lot on theory and also a lot of practical considerations for example Section 10.5.2: Basing the Radius Update on the Steplength where they write
There's more of course, but reading the comment that was in Optim from the original implementation, I think that it may have been motivated by this section or a paper that covers the same idea - though I did not write that code. The idea is quite old and they cite Powell 1970: A new algorithm for unconstrained optimization and the implementation of the paper in 1970: A Fortran subroutine for unconstrained minimization requiring first derivatives of the objective function in the Notes and references, but I have not looked that paper up yet. They cite other papers from 1975 up to the 90s with similar ideas of using the step-length to quickly reduce the radius to something smaller than ||s|| if s is interior and is rejected to avoid unnecessary computations.
Reducing it by 75% is maybe too much, but I think it is generally not a thing that we hope occurs here, because unlike the general treatment of CTG and Yuan, the model is based on the Hessian, not potentially some approximation. So if you actually have a solution (based on the real hessian) that is interior but not accepted, then you objective is probably not too well-behaved (in the sense that it's not "locally" quadratic all the way out to x+s) and you may want to quickly reduce the radius to make sure that the model function and objective function align well such that the predicted reduction matches the realized reduction.