OptimizationProblems.jl AD error with rat42

Even though the AD backend for the residual Jacobian is defined, it returns an empty Jacobian:

julia> rat42_nls = rat42(; use_nls = true)                                                                                                                        
ADNLSModel - Nonlinear least-squares model with automatic differentiation backend ADModelBackend{                                                                 
  ForwardDiffADGradient,                                                                                                                                          
  ForwardDiffADHvprod,                                                                                                                                            
  EmptyADbackend,                                                                                                                                                 
  EmptyADbackend,                                                                                                                                                 
  EmptyADbackend,                                                                                                                                                 
  SparseADHessian,                                                                                                                                                
  EmptyADbackend,                                                                                                                                                 
  ForwardDiffADHvprod,                                                                                                                                            
  ForwardDiffADJprod,                                                                                                                                             
  ForwardDiffADJtprod,                                                                                                                                            
  SparseADJacobian,                                                                                                                                               
  SparseADHessian,                                                                                                                                                
}         

julia> jac_structure(rat42_nls)                                                                                                                                   
(Int64[], Int64[]) 

julia> x = [99.94710393753867, 1.6483148108416925, -12.429444473714828]

julia> jac(rat42_nls, x)                                                                                                                                          
0×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries

There is also an error with the gradient of the NLP model:

julia> rat42_model = rat42()
ADNLPModel - Model with automatic differentiation backend ADModelBackend{
  ForwardDiffADGradient,
  ForwardDiffADHvprod,
  EmptyADbackend,
  EmptyADbackend,
  EmptyADbackend,
  SparseADHessian,
  EmptyADbackend,
}

julia> obj(rat42_model, x)
9111.7101

julia> grad(rat42_model, x)
3-element Vector{Float64}:
 NaN
 NaN
 NaN

@amontoison @tmigot Any ideas?

Aug 18 '24 16:08 dpo

@dpo rat42 is an unconstrained optimization problem so we have have an empty backend for the jacobian in rat42_nls. What you want is jac_structure_residuals(rat42_nls).

I think we should better displayed what backend is associated to what when we display an ADNLPModel / ADNLSModel.

For the gradient of rat42_model, it seems to be a bug in ForwardDiff.jl, I will investigate.

Aug 18 '24 17:08 amontoison

Sorry, you’re right. However:

julia> jac_residual(rat42_nls, x)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
  -5.03261e-50     5.02995e-48    -4.52696e-47
  -5.14752e-77     5.1448e-75     -7.20271e-74
  -8.42023e-115    8.41577e-113   -1.76731e-111
  -1.37737e-152    1.37664e-150   -3.85459e-149
  -3.68555e-228   -0.0            -0.0
 NaN             NaN             NaN
 NaN             NaN             NaN
 NaN             NaN             NaN
 NaN             NaN             NaN

Aug 18 '24 19:08 dpo

I suspect that it's related to this vector of Rational: https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/main/src/ADNLPProblems/rat42.jl#L9

Aug 18 '24 19:08 amontoison

Me too. That's very weird modeling. For info, with JuMP:

julia> model = MathOptNLPModel(rat42())

julia> obj(model, x)
9111.7101


julia> grad(model, x)
3-element Vector{Float64}:
 -4.494124501515864e-49
  4.491747286612452e-47
 -4.042572557951207e-46

The JuMP/NLS version is not in the repository, but:

julia> model = Model()

julia> @variable(model, x[j = 1:3])

julia> set_start_value.(x, [100, 1, 0.1])

julia> for i = 1:9
         @eval @NLexpression(model, $(Symbol("F$i")),  y[$i, 1] - x[1] / (1 + exp(x[2] - x[3] * y[$i, 2])))
       end

julia> rat42_nls = MathOptNLSModel(model, [F1, F2, F3, F4, F5, F6, F7, F8, F9])

julia> xx = [99.94710393753867, 1.6483148108416925, -12.429444473714828]

julia> obj(rat42_nls, xx)
9111.7101

julia> jac_residual(rat42_nls, xx)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
 -5.03261e-50   5.02995e-48   -4.52696e-47
 -5.14752e-77   5.1448e-75    -7.20271e-74
 -8.42023e-115  8.41577e-113  -1.76731e-111
 -1.37737e-152  1.37664e-150  -3.85459e-149
 -3.68555e-228  0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0
  0.0           0.0            0.0

Aug 18 '24 19:08 dpo

Actually, converting the array y to type T produces the same errors.

Aug 18 '24 20:08 dpo

ReverseDiff is only marginally better:

julia> grad(model, xx)
3-element Vector{Float64}:
  -4.494124501515864e-49
 NaN
 NaN

Aug 18 '24 20:08 dpo

And, finally,

julia> model = rat42(; gradient_backend = ADNLPModels.ZygoteADGradient)

julia> grad(model, xx)
3-element Vector{Float64}:
  -4.494124501515864e-49
 NaN
 NaN

Aug 18 '24 20:08 dpo

I suppose the problem here is simply that we need problem scaling. The term exp(1.6 + 12 * 79) overflows. JuMP must be scaling the problem.

Aug 18 '24 23:08 dpo

I am not aware of any scaling in JuMP. They probably just do something smart if they encounter a division by Inf.

Aug 19 '24 00:08 amontoison

Why would they have different arithmetic than Julia itself?

Aug 19 '24 00:08 dpo

Because they compute the derivative with a reverse pass directly on their expression tree. They can check if they propagate or not a NaN or a Inf.

Aug 20 '24 03:08 amontoison