NLopt.jl icon indicating copy to clipboard operation
NLopt.jl copied to clipboard

FAILURE with L-BFGS depending on system size and starting point

Open Liozou opened this issue 1 year ago • 0 comments

Hi! I am facing a FAILURE when trying to minimize a function with 62339366 variables using L-BFGS with a box constraint. The issue does not appear when using just one less variable.

The function is $f: \vec x\mapsto \sum_i -50x_i + 300 x_i(\log(x_i) - 1)$ where $\vec x$ is a vector of 62339366 variables $x_i$. The box constraint is that the $x_i$ should be above a fixed SMALL_CONSTANT, for example 0.01, so that $(\vec\nabla f(\vec x))_i = -50 + 300\log(x_i)$ does not diverge. The exact value of SMALL_CONSTANT does not matter here, the same issue occurs using SMALL_CONSTANT = 1e-100.

The numerical solution should be a vector of ω = 1.1813604128656459, since $-50 + 300\log(\omega) \approx 0$. The initial point is a vector of numbers uniformly taken in $[(1-η)ω;\ (1+η)ω]$ where η = 0.001 for instance.

Here is the minimal reproducer (I think it requires having 16GiB RAM):

module TestNLoptLBFGSfailure
using NLopt

import LinearAlgebra: norm

const SMALL_CONSTANT = 1e-2
const ω = 1.1813604128656459
const η = 1e-3

function objective(point::Vector{Float64}, gradient::Vector{Float64})
    if length(gradient) > 0
        gradient .= -50 .+ 300 .* log.(point)
    end
    value = sum(-50*r + 300*(r*(log(r) - 1)) for r in point)
    @show value, norm(gradient)
    value
end

function test(n)
    opt = Opt(:LD_LBFGS, n)
    opt.ftol_rel = 1e-10 # or 1e-3, it does not matter
    opt.lower_bounds = fill(SMALL_CONSTANT, n)
    opt.min_objective = objective

    init = ω .* rand(1 .+ (-η:η), n)
    # init = fill(ω, n) # actual target

    minf, minx, ret = optimize(opt, init)
    ret
end

end

and the observations:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093566731912422e10, 2369.8435877962493)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365) # just 62339366 - 1
(value, norm(gradient)) = (-2.2093566377504475e10, 2369.843568788648)
(value, norm(gradient)) = (-2.143919631036808e10, 534364.1274153551)
(value, norm(gradient)) = (-2.209355118106552e10, 3646.620883772766)
(value, norm(gradient)) = (-2.209357736243934e10, 2.2617428385919083)
(value, norm(gradient)) = (-2.209357736243935e10, 0.001131058801969029)
(value, norm(gradient)) = (-2.209357736243935e10, 6.7321322189234e-10)
:SUCCESS

If I use the target value ω directly as initialization (i.e. taking η = 0), then it works:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093577716847473e10, 2.2440440909730785e-10)
:SUCCESS

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 2.2440440729744666e-10)
:SUCCESS

And if I set the starting point too close to the target, i.e. $η$ too small, for instance η = 1e-9, then it fails for both cases, but it does not perform the same number of steps:

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.2093577716847473e10, 0.0023686585521977225)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 0.0023686585331996264)
(value, norm(gradient)) = (-2.209357736243935e10, 0.5991391139757002)
(value, norm(gradient)) = (-2.209357736243935e10, 0.3982398117902327)
(value, norm(gradient)) = (-2.209357736243935e10, 0.2643059496298344)
(value, norm(gradient)) = (-2.209357736243935e10, 0.17501521403019882)
(value, norm(gradient)) = (-2.209357736243935e10, 0.1154857940136743)
(value, norm(gradient)) = (-2.209357736243935e10, 0.07579606004479056)
(value, norm(gradient)) = (-2.209357736243935e10, 0.049330911796086924)
(value, norm(gradient)) = (-2.209357736243935e10, 0.03167915082330152)
(value, norm(gradient)) = (-2.209357736243935e10, 0.01989798378814819)
(value, norm(gradient)) = (-2.209357736243935e10, 0.012021766484118854)
(value, norm(gradient)) = (-2.209357736243935e10, 0.006732064617095256)
:FAILURE

May be related to https://github.com/JuliaOpt/NLopt.jl/issues/33. The last point (optimization fails if the initial point is too close to the optimum) reflects https://github.com/JuliaOpt/NLopt.jl/issues/33#issuecomment-785340103, but the difference between 62339365 and 62339366 variables is new I think? My real problem is of course more complex and even larger than 62339366 variables.

Liozou avatar Apr 27 '23 10:04 Liozou