LineSearches.jl icon indicating copy to clipboard operation
LineSearches.jl copied to clipboard

Suboptimal choice of alpha in Hager-Zhang

Open mateuszbaran opened this issue 1 year ago • 10 comments

I've encountered a weird issue with alpha value selection in Hager-Zhang. Let me illustrate. Here are the values computed during bracketing:

alphas = [0.0, 1.0, 0.5, 0.25, 0.11797881082031754, 0.05898940541015877, 0.029494702705079385, 0.014747351352539692, 0.007373675676269846]
values = [32867.0, 33089.907756650115, 33027.619212962294, 32749.607153359673, 33107.9648525175, 33056.34280854859, 32990.02494986772, 32936.18207166336, 32831.66557482673]
slopes = [-171068.0, -29730.999177119797, -74242.25565689849, 191429.29678267744, -7900.963612183026, -51707.79133751766, -103642.35349017732, -139136.2036028941, 182648.13853509305]

The search ends on this line https://github.com/JuliaNLSolvers/LineSearches.jl/blob/ded667a80f47886c77d67e8890f6adb127679ab4/src/hagerzhang.jl#L287 , with iA=1, ia=1, ib=4 and iB=9. Clearly, out of these four, iA is the worst choice (highest cost value) but nevertheless it's the value returned by the procedure. My function isn't particularly well-behaved but shouldn't the line search try a bit harder to pick the best value? I mean, it could just look at ib and iB instead of just returning iA in that line. What's worse here, the returned alpha is 0 which throws off the direction selection algorithm I'm using.

mateuszbaran avatar Dec 30 '23 20:12 mateuszbaran

I've forked H-Z with a fix: https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl .

mateuszbaran avatar Jan 16 '24 10:01 mateuszbaran

Either it's not thought of in the original implementation, or there's a bug here perhaps. I'll have to look at the code and maybe my other implementation in NLSolvers.jl. Maybe the paper as well.

pkofod avatar Jan 16 '24 21:01 pkofod

For reference, here's a plot of the values:

ls

I think HZ assumes the function is differentiable. If it isn't (obviously I can't tell from the plot), then another line search method might be better.

But that exit line you identified looks like a "I'm confused, let's bail" and so it would make sense to take an argmin(values) and return that alpha.

timholy avatar Jan 22 '24 10:01 timholy

The original function was differentiable, and taking the minimum is a solution that works for me. I can make a PR if it's an acceptable fix, though this issue may be a symptom of a bug in HZ.

mateuszbaran avatar Jan 22 '24 12:01 mateuszbaran

Interesting. Is it a case you can share? I can't promise to spend time on it, but I am intrigued.

timholy avatar Jan 22 '24 20:01 timholy

It was some variant of Rosenbrock similar to this one: https://github.com/JuliaManifolds/Manopt.jl/pull/339/files#diff-3f760f49cdd6fe09857899f193de7724904da2185f32c61378c068841df87cebR245-R267 , likely with a different N . Anyway, I just realized that I did the initial alpha guess and alpha_max suboptimally (I forgot to divide max_stepsize by the norm of descent direction).

mateuszbaran avatar Jan 22 '24 21:01 mateuszbaran

It is a thing that I encounter sometimes in optimization on compact manifolds: gradient is way longer than injectivity radius and the initial alpha equal to 1 is a really poor guess.

mateuszbaran avatar Jan 22 '24 21:01 mateuszbaran

@mateuszbaran, if you can't easily share a reproducer, one option is to run with option display = typemax(UInt64) and then share the output. That might help us understand better how this bad outcome happens.

timholy avatar Jan 23 '24 13:01 timholy

Here is a (relatively) simple reproducer:

using Manopt
using Manifolds
using LineSearches
using LinearAlgebra

norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)

function f_rosenbrock_manopt(::AbstractManifold, x)
    result = 0.0
    for i in 1:2:length(x)
        result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
    end
    return result
end

function g_rosenbrock_manopt!(M::AbstractManifold, storage, x)
    for i in 1:2:length(x)
        storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
        storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
    end
    riemannian_gradient!(M, storage, x, storage)
    return storage
end


function test_case_manopt()
    N = 4000
    mem_len = 2
    M = Manifolds.Sphere(N - 1; parameter=:field)
    ls_hz = LineSearches.HagerZhang()

    x0 = zeros(N)
    x0[1] = 1
    manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)

    return quasi_Newton(
        M,
        f_rosenbrock_manopt,
        g_rosenbrock_manopt!,
        x0;
        stepsize=Manopt.LineSearchesStepsize(ls_hz),
        evaluation=InplaceEvaluation(),
        vector_transport_method=ProjectionTransport(),
        return_state=true,
        memory_size=mem_len,
        stopping_criterion=manopt_sc,
    )
end
test_case_manopt()

It gives different numbers than the original example but the issue is essentially the same. The problem happens during the third iteration of minimization using Manopt.

mateuszbaran avatar Jan 23 '24 18:01 mateuszbaran

Thanks both of you for taking the initiative here!

pkofod avatar Jan 29 '24 08:01 pkofod