NLopt.jl
NLopt.jl copied to clipboard
FAILURE with L-BFGS depending on system size and starting point
Hi! I am facing a FAILURE
when trying to minimize a function with 62339366 variables using L-BFGS with a box constraint. The issue does not appear when using just one less variable.
The function is $f: \vec x\mapsto \sum_i -50x_i + 300 x_i(\log(x_i) - 1)$ where $\vec x$ is a vector of 62339366 variables $x_i$. The box constraint is that the $x_i$ should be above a fixed SMALL_CONSTANT
, for example 0.01
, so that $(\vec\nabla f(\vec x))_i = -50 + 300\log(x_i)$ does not diverge. The exact value of SMALL_CONSTANT
does not matter here, the same issue occurs using SMALL_CONSTANT = 1e-100
.
The numerical solution should be a vector of ω = 1.1813604128656459
, since $-50 + 300\log(\omega) \approx 0$. The initial point is a vector of numbers uniformly taken in $[(1-η)ω;\ (1+η)ω]$ where η = 0.001
for instance.
Here is the minimal reproducer (I think it requires having 16GiB RAM):
module TestNLoptLBFGSfailure
using NLopt
import LinearAlgebra: norm
const SMALL_CONSTANT = 1e-2
const ω = 1.1813604128656459
const η = 1e-3
function objective(point::Vector{Float64}, gradient::Vector{Float64})
if length(gradient) > 0
gradient .= -50 .+ 300 .* log.(point)
end
value = sum(-50*r + 300*(r*(log(r) - 1)) for r in point)
@show value, norm(gradient)
value
end
function test(n)
opt = Opt(:LD_LBFGS, n)
opt.ftol_rel = 1e-10 # or 1e-3, it does not matter
opt.lower_bounds = fill(SMALL_CONSTANT, n)
opt.min_objective = objective
init = ω .* rand(1 .+ (-η:η), n)
# init = fill(ω, n) # actual target
minf, minx, ret = optimize(opt, init)
ret
end
end
and the observations:
julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093566731912422e10, 2369.8435877962493)
:FAILURE
julia> TestNLoptLBFGSfailure.test(62339365) # just 62339366 - 1
(value, norm(gradient)) = (-2.2093566377504475e10, 2369.843568788648)
(value, norm(gradient)) = (-2.143919631036808e10, 534364.1274153551)
(value, norm(gradient)) = (-2.209355118106552e10, 3646.620883772766)
(value, norm(gradient)) = (-2.209357736243934e10, 2.2617428385919083)
(value, norm(gradient)) = (-2.209357736243935e10, 0.001131058801969029)
(value, norm(gradient)) = (-2.209357736243935e10, 6.7321322189234e-10)
:SUCCESS
If I use the target value ω
directly as initialization (i.e. taking η = 0
), then it works:
julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093577716847473e10, 2.2440440909730785e-10)
:SUCCESS
julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 2.2440440729744666e-10)
:SUCCESS
And if I set the starting point too close to the target, i.e. $η$ too small, for instance η = 1e-9
, then it fails for both cases, but it does not perform the same number of steps:
julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.2093577716847473e10, 0.0023686585521977225)
:FAILURE
julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 0.0023686585331996264)
(value, norm(gradient)) = (-2.209357736243935e10, 0.5991391139757002)
(value, norm(gradient)) = (-2.209357736243935e10, 0.3982398117902327)
(value, norm(gradient)) = (-2.209357736243935e10, 0.2643059496298344)
(value, norm(gradient)) = (-2.209357736243935e10, 0.17501521403019882)
(value, norm(gradient)) = (-2.209357736243935e10, 0.1154857940136743)
(value, norm(gradient)) = (-2.209357736243935e10, 0.07579606004479056)
(value, norm(gradient)) = (-2.209357736243935e10, 0.049330911796086924)
(value, norm(gradient)) = (-2.209357736243935e10, 0.03167915082330152)
(value, norm(gradient)) = (-2.209357736243935e10, 0.01989798378814819)
(value, norm(gradient)) = (-2.209357736243935e10, 0.012021766484118854)
(value, norm(gradient)) = (-2.209357736243935e10, 0.006732064617095256)
:FAILURE
May be related to https://github.com/JuliaOpt/NLopt.jl/issues/33. The last point (optimization fails if the initial point is too close to the optimum) reflects https://github.com/JuliaOpt/NLopt.jl/issues/33#issuecomment-785340103, but the difference between 62339365 and 62339366 variables is new I think? My real problem is of course more complex and even larger than 62339366 variables.