MixedModels.jl Issue with convergence when using PRAXIS

Models that are run with PRAXIS as the optimizer will not terminate. In the simple example below, the model will spin indefinitely but if manually stopped, it will return the expected coefficients. BOBYQA works correctly in this case.

Version Info:

Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores
Environment:
  JULIA_PROJECT = /Users/liam/Development/attribution-engine/ma-model/SignConstrainedMixedModels

Example code:

using MixedModels, DataFrames, Random

# Arrange
id1 = Int[1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
var1 = Float64[
    -0.968,
    -1.568,
    -4.101,
    2.616,
    2.658,
    1.352,
    3.836,
    -0.234,
    -1.257,
    -0.482,
    3.810,
    -2.812,
    1.382,
    -4.764,
    3.977,
    1.302,
    2.186,
    -4.237,
    4.803,
    -1.026,
]
var2 = Float64[
    2.633,
    2.565,
    2.685,
    2.661,
    0.048,
    1.285,
    2.127,
    1.687,
    1.542,
    0.932,
    1.650,
    2.239,
    0.239,
    0.247,
    0.989,
    1.800,
    0.667,
    1.412,
    2.497,
    2.004,
]
var3 = Float64[
    -4.805,
    -4.048,
    -2.160,
    -0.233,
    -2.655,
    -2.062,
    -0.395,
    -2.706,
    -3.509,
    -2.800,
    -2.927,
    -1.843,
    -2.401,
    -3.332,
    -1.413,
    -4.351,
    -4.917,
    -0.977,
    -0.303,
    -1.072,
]
vol = [
    0.25 * var1[1:5] + 0.3 * var2[1:5] + -0.6 * var3[1:5]
    0.15 * var1[6:10] + 0.3 * var2[6:10] + -0.6 * var3[6:10]
    0.05 * var1[11:15] + 0.3 * var2[11:15] + -0.6 * var3[11:15]
    -0.05 * var1[16:20] + 0.3 * var2[16:20] + -0.6 * var3[16:20]
]

noise = rand(MersenneTwister(5234325), size(vol, 1))
vol = vol .+ noise
data = DataFrames.DataFrame(id1 = id1, var1 = var1, var2 = var2, var3 = var3, vol = vol)

f = @formula(vol ~ 1 + var1 + var3 + var2 + (0 + var1 | id1))

model = LinearMixedModel(f, data)
model.optsum.optimizer = :LN_PRAXIS
model_fit = fit!(model)

Sep 29 '23 17:09 liamlundy

Thanks for proving an MWE!

Looking at the NLopt docs, I'm not sure PRAXIS is an algorithm I'd recommend:

This algorithm was originally designed for unconstrained optimization. In NLopt, bound constraints are "implemented" in PRAXIS by the simple expedient of returning infinity (Inf) when the constraints are violated (this is done automatically—you don't have to do this in your own function). This seems to work, more-or-less, but appears to slow convergence significantly. If you have bound constraints, you are probably better off using COBYLA or BOBYQA.

The optimization problem here is indeed bounded (we don't allow variances to be negative). The :LN_COBYLA and :LN_NELDERMEAD converge for this example as well and in my limited testing seem to also work well on cases where BOBYQA fails.

Oct 03 '23 21:10 palday

@palday thanks for getting back to me about this! I definitely appreciate your work maintaining this project. For our use case, we use BOBYQA for all of our final models, but use PRAXIS for preliminary testing of variables. Our datasets can become very large and PRAXIS outperforms BOBYQA by a wide margin in terms of performance (at least in our test cases). Do you know if there is a way to get PRAXIS working by modifying the stopping condition?

Oct 11 '23 14:10 liamlundy

The problem in the MWE is that the optimizer winds up oscillating around the optimum.

           ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀objective⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ 
           ┌────────────────────────────────────────┐ 
   11.0467 │⠀⠀⢰⠀⠀⡖⢲⠀⠀⡆⠀⢠⡆⠀⢰⠒⡆⠀⢰⠀⠀⡆⠀⢠⠒⡆⠀⢠⡆⠀⢰⠀⠀⡖⢲⠀⠀⡆⠀⢠│ 
           │⠀⠀⢸⠀⠀⡇⢸⠀⠀⡇⠀⢸⡇⠀⢸⠀⡇⠀⢸⠀⠀⡇⠀⢸⠀⡇⠀⢸⡇⠀⢸⠀⠀⡇⢸⠀⠀⡇⠀⢸│ 
           │⠀⠀⣾⠀⠀⡇⢸⠀⠀⣷⠀⢸⡇⠀⡎⠀⡇⠀⣾⠀⠀⣷⠀⢸⠀⢱⠀⢸⡇⠀⣾⠀⠀⡇⢸⠀⠀⣷⠀⢸│ 
           │⠀⠀⣿⠀⠀⡇⢸⠀⠀⣿⠀⢸⡇⠀⡇⠀⡇⠀⣿⠀⠀⣿⠀⢸⠀⢸⠀⢸⡇⠀⣿⠀⠀⡇⢸⠀⠀⣿⠀⢸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⢸⡇⠀⡇⠀⡇⠀⡇⡇⢸⢸⠀⢸⠀⢸⠀⢸⡇⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⢸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡸⢇⠀⡇⠀⢇⠀⡇⡇⢸⢸⠀⡸⠀⢸⠀⡸⢇⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡸│ 
           │⠀⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡇⢸⠀⡇⠀⢸⠀⡇⡇⢸⢸⠀⡇⠀⢸⠀⡇⢸⠀⡇⡇⢸⠀⠀⡇⢸⢸⠀⡇│ 
           │⠀⢠⠃⡇⢸⠀⠀⡇⢸⠘⡄⡇⢸⢠⠃⠀⢸⢠⠃⡇⢸⠘⡄⡇⠀⠘⡄⡇⢸⢠⠃⡇⢸⠀⠀⡇⢸⠘⡄⡇│ 
           │⠀⢸⠀⡇⢸⠀⠀⡇⢸⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⡇⢸⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⡇⢸⠀⠀⡇⢸⠀⡇⡇│ 
           │⠀⢸⠀⢱⡎⠀⠀⢱⡎⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⢱⡎⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⢱⡎⠀⠀⢱⡎⠀⡇⡇│ 
           │⠀⢸⠀⢸⡇⠀⠀⢸⡇⠀⡇⡇⢸⢸⠀⠀⢸⢸⠀⢸⡇⠀⡇⡇⠀⠀⡇⡇⢸⢸⠀⢸⡇⠀⠀⢸⡇⠀⡇⡇│ 
           │⠀⢸⠀⢸⡇⠀⠀⢸⡇⠀⣿⠀⠀⣿⠀⠀⠀⣿⠀⢸⡇⠀⣿⠀⠀⠀⣿⠀⠀⣿⠀⢸⡇⠀⠀⢸⡇⠀⣿⠀│ 
           │⠀⡸⠀⢸⡇⠀⠀⢸⡇⠀⢿⠀⠀⡿⠀⠀⠀⡿⠀⢸⡇⠀⢿⠀⠀⠀⢿⠀⠀⡿⠀⢸⡇⠀⠀⢸⡇⠀⢿⠀│ 
           │⠀⡇⠀⢸⡇⠀⠀⢸⡇⠀⢸⠀⠀⡇⠀⠀⠀⡇⠀⢸⡇⠀⢸⠀⠀⠀⢸⠀⠀⡇⠀⢸⡇⠀⠀⢸⡇⠀⢸⠀│ 
   11.0467 │⠀⠇⠀⠘⠇⠀⠀⠘⠇⠀⠸⠀⠀⠇⠀⠀⠀⠇⠀⠘⠇⠀⠸⠀⠀⠀⠸⠀⠀⠇⠀⠘⠇⠀⠀⠘⠇⠀⠸⠀│ 
           └────────────────────────────────────────┘ 
           ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀30⠀ 

          ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀theta⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ 
          ┌────────────────────────────────────────┐ 
    0.552 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠀⠀⠀⠀⢠⠀⠀⠀⠀⠀⠀⠀⠀⢀⡄⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⢠⢣⠀⠀⠀⠀⠀⠀⠀⠀⣾⠀⠀⠀⠀⠀⠀⠀⠀⢸⡇⠀⠀⠀⠀⠀⠀⠀⢠⢣⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⡸⢸⠀⠀⠀⠀⠀⠀⠀⢠⠋⡆⠀⠀⠀⠀⠀⠀⠀⡇⡇⠀⠀⠀⠀⠀⠀⠀⡸⢸⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀⠀⡸⠀⡇⠀⠀⠀⠀⠀⠀⢰⠁⢱⠀⠀⠀⠀⠀⠀⠀⡇⢸⠀⠀⠀⠀⠀⠀│ 
          │⠀⢣⠀⢸⠁⠀⡇⢸⡇⠀⣼⠀⢀⠇⠀⢇⢀⢧⠀⢸⡇⠀⡜⠀⢸⠀⣼⠀⢀⢧⠀⢸⠁⠀⡇⢸⡇⠀⣼⠀│ 
          │⠀⠸⡀⡎⠀⠀⡇⡎⢣⢀⠇⡇⢸⠀⠀⢸⢸⠸⡀⡎⢣⢀⠇⠀⠸⣀⠇⡇⢸⠸⡀⡎⠀⠀⡇⡎⢣⢀⠇⡇│ 
          │⠀⠀⣧⠃⠀⠀⢣⠃⠸⣸⠀⢱⡇⠀⠀⢸⡇⠀⣧⠃⠸⣸⠀⠀⠀⣿⠀⢱⡇⠀⣧⠃⠀⠀⢣⠃⠸⣸⠀⢱│ 
          │⠀⠀⠘⠀⠀⠀⠘⠀⠀⠃⠀⠘⠃⠀⠀⠈⠃⠀⠘⠀⠀⠃⠀⠀⠀⠃⠀⠘⠃⠀⠘⠀⠀⠀⠘⠀⠀⠃⠀⠘│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
   0.5517 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
          └────────────────────────────────────────┘ 
          ⠀0⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀30⠀

Examining the change in values, I get

julia> minimum(abs, diff(obj))
2.0605739337042905e-13

julia> minimum(abs, diff(theta))
1.0706571629270911e-7

This suggests that you could achieve convergence either by setting in the optsum either xtol_abs = [1e-6] or ftol_abs = 1e-12. (See here for the list of settings you can change.) Since you're using this for an initial exploration, the logic would be similar to the motivation for the reduced precision bootstrap.

However, setting this for :LN_PRAXIS seems to have no effect. I have no idea why. maxfeval seems to work though, but a good value for that will depend on your real data.

Oct 25 '23 01:10 palday

To follow up on this issue and #742 , using PRIMA.bobyqa produces

julia> x, info = bobyqa(objective!(model), model.optsum.initial; xl=model.optsum.lowerbd)
([0.5518592883381098], PRIMA.Info(11.04666952249386, 15, PRIMA.SMALL_TR_RADIUS, 0.0, Float64[], Float64[]))

In other words, it took only 15 evaluations to converge to what seems to be a good optimum.

Feb 07 '24 16:02 dmbates