AD error with rat42
Even though the AD backend for the residual Jacobian is defined, it returns an empty Jacobian:
julia> rat42_nls = rat42(; use_nls = true)
ADNLSModel - Nonlinear least-squares model with automatic differentiation backend ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
ForwardDiffADHvprod,
ForwardDiffADJprod,
ForwardDiffADJtprod,
SparseADJacobian,
SparseADHessian,
}
julia> jac_structure(rat42_nls)
(Int64[], Int64[])
julia> x = [99.94710393753867, 1.6483148108416925, -12.429444473714828]
julia> jac(rat42_nls, x)
0×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries
There is also an error with the gradient of the NLP model:
julia> rat42_model = rat42()
ADNLPModel - Model with automatic differentiation backend ADModelBackend{
ForwardDiffADGradient,
ForwardDiffADHvprod,
EmptyADbackend,
EmptyADbackend,
EmptyADbackend,
SparseADHessian,
EmptyADbackend,
}
julia> obj(rat42_model, x)
9111.7101
julia> grad(rat42_model, x)
3-element Vector{Float64}:
NaN
NaN
NaN
@amontoison @tmigot Any ideas?
@dpo rat42 is an unconstrained optimization problem so we have have an empty backend for the jacobian in rat42_nls.
What you want is jac_structure_residuals(rat42_nls).
I think we should better displayed what backend is associated to what when we display an ADNLPModel / ADNLSModel.
For the gradient of rat42_model, it seems to be a bug in ForwardDiff.jl, I will investigate.
Sorry, you’re right. However:
julia> jac_residual(rat42_nls, x)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
-5.03261e-50 5.02995e-48 -4.52696e-47
-5.14752e-77 5.1448e-75 -7.20271e-74
-8.42023e-115 8.41577e-113 -1.76731e-111
-1.37737e-152 1.37664e-150 -3.85459e-149
-3.68555e-228 -0.0 -0.0
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
I suspect that it's related to this vector of Rational:
https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/main/src/ADNLPProblems/rat42.jl#L9
Me too. That's very weird modeling. For info, with JuMP:
julia> model = MathOptNLPModel(rat42())
julia> obj(model, x)
9111.7101
julia> grad(model, x)
3-element Vector{Float64}:
-4.494124501515864e-49
4.491747286612452e-47
-4.042572557951207e-46
The JuMP/NLS version is not in the repository, but:
julia> model = Model()
julia> @variable(model, x[j = 1:3])
julia> set_start_value.(x, [100, 1, 0.1])
julia> for i = 1:9
@eval @NLexpression(model, $(Symbol("F$i")), y[$i, 1] - x[1] / (1 + exp(x[2] - x[3] * y[$i, 2])))
end
julia> rat42_nls = MathOptNLSModel(model, [F1, F2, F3, F4, F5, F6, F7, F8, F9])
julia> xx = [99.94710393753867, 1.6483148108416925, -12.429444473714828]
julia> obj(rat42_nls, xx)
9111.7101
julia> jac_residual(rat42_nls, xx)
9×3 SparseMatrixCSC{Float64, Int64} with 27 stored entries:
-5.03261e-50 5.02995e-48 -4.52696e-47
-5.14752e-77 5.1448e-75 -7.20271e-74
-8.42023e-115 8.41577e-113 -1.76731e-111
-1.37737e-152 1.37664e-150 -3.85459e-149
-3.68555e-228 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
Actually, converting the array y to type T produces the same errors.
ReverseDiff is only marginally better:
julia> grad(model, xx)
3-element Vector{Float64}:
-4.494124501515864e-49
NaN
NaN
And, finally,
julia> model = rat42(; gradient_backend = ADNLPModels.ZygoteADGradient)
julia> grad(model, xx)
3-element Vector{Float64}:
-4.494124501515864e-49
NaN
NaN
I suppose the problem here is simply that we need problem scaling. The term exp(1.6 + 12 * 79) overflows. JuMP must be scaling the problem.
I am not aware of any scaling in JuMP. They probably just do something smart if they encounter a division by Inf.
Why would they have different arithmetic than Julia itself?
Because they compute the derivative with a reverse pass directly on their expression tree.
They can check if they propagate or not a NaN or a Inf.