LineSearches.jl
LineSearches.jl copied to clipboard
Suboptimal choice of alpha in Hager-Zhang
I've encountered a weird issue with alpha value selection in Hager-Zhang. Let me illustrate. Here are the values computed during bracketing:
alphas = [0.0, 1.0, 0.5, 0.25, 0.11797881082031754, 0.05898940541015877, 0.029494702705079385, 0.014747351352539692, 0.007373675676269846]
values = [32867.0, 33089.907756650115, 33027.619212962294, 32749.607153359673, 33107.9648525175, 33056.34280854859, 32990.02494986772, 32936.18207166336, 32831.66557482673]
slopes = [-171068.0, -29730.999177119797, -74242.25565689849, 191429.29678267744, -7900.963612183026, -51707.79133751766, -103642.35349017732, -139136.2036028941, 182648.13853509305]
The search ends on this line https://github.com/JuliaNLSolvers/LineSearches.jl/blob/ded667a80f47886c77d67e8890f6adb127679ab4/src/hagerzhang.jl#L287 , with iA=1, ia=1, ib=4 and iB=9. Clearly, out of these four, iA is the worst choice (highest cost value) but nevertheless it's the value returned by the procedure. My function isn't particularly well-behaved but shouldn't the line search try a bit harder to pick the best value? I mean, it could just look at ib and iB instead of just returning iA in that line. What's worse here, the returned alpha is 0 which throws off the direction selection algorithm I'm using.
I've forked H-Z with a fix: https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl .
Either it's not thought of in the original implementation, or there's a bug here perhaps. I'll have to look at the code and maybe my other implementation in NLSolvers.jl. Maybe the paper as well.
For reference, here's a plot of the values:
I think HZ assumes the function is differentiable. If it isn't (obviously I can't tell from the plot), then another line search method might be better.
But that exit line you identified looks like a "I'm confused, let's bail" and so it would make sense to take an argmin(values) and return that alpha.
The original function was differentiable, and taking the minimum is a solution that works for me. I can make a PR if it's an acceptable fix, though this issue may be a symptom of a bug in HZ.
Interesting. Is it a case you can share? I can't promise to spend time on it, but I am intrigued.
It was some variant of Rosenbrock similar to this one: https://github.com/JuliaManifolds/Manopt.jl/pull/339/files#diff-3f760f49cdd6fe09857899f193de7724904da2185f32c61378c068841df87cebR245-R267 , likely with a different N . Anyway, I just realized that I did the initial alpha guess and alpha_max suboptimally (I forgot to divide max_stepsize by the norm of descent direction).
It is a thing that I encounter sometimes in optimization on compact manifolds: gradient is way longer than injectivity radius and the initial alpha equal to 1 is a really poor guess.
@mateuszbaran, if you can't easily share a reproducer, one option is to run with option display = typemax(UInt64) and then share the output. That might help us understand better how this bad outcome happens.
Here is a (relatively) simple reproducer:
using Manopt
using Manifolds
using LineSearches
using LinearAlgebra
norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)
function f_rosenbrock_manopt(::AbstractManifold, x)
result = 0.0
for i in 1:2:length(x)
result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
end
return result
end
function g_rosenbrock_manopt!(M::AbstractManifold, storage, x)
for i in 1:2:length(x)
storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
end
riemannian_gradient!(M, storage, x, storage)
return storage
end
function test_case_manopt()
N = 4000
mem_len = 2
M = Manifolds.Sphere(N - 1; parameter=:field)
ls_hz = LineSearches.HagerZhang()
x0 = zeros(N)
x0[1] = 1
manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)
return quasi_Newton(
M,
f_rosenbrock_manopt,
g_rosenbrock_manopt!,
x0;
stepsize=Manopt.LineSearchesStepsize(ls_hz),
evaluation=InplaceEvaluation(),
vector_transport_method=ProjectionTransport(),
return_state=true,
memory_size=mem_len,
stopping_criterion=manopt_sc,
)
end
test_case_manopt()
It gives different numbers than the original example but the issue is essentially the same. The problem happens during the third iteration of minimization using Manopt.
Thanks both of you for taking the initiative here!