Add new backends with DifferentiationInterface.jl
Add the following backends:
- Enzyme
- Zygote
- Mooncake
- Diffractor
- Tracker
- Symbolics
- ChainRules
- FastDifferentiation
- FiniteDiff
- FiniteDifferences
- PolyesterForwardDiff
It's only one, so basically with this change we would no longer be able to use Hessian for Enzyme and Zygote.
We can but only for unconstrained problems. I wanted to remove what was not working before.
The user will no longer be able to use an incorrect Hessian, which is better for everyone.
@gdalle May I ask you to check what I did wrong in the file di.jl?
I have different errors with buildkite: https://buildkite.com/julialang/adnlpmodels-dot-jl/builds/243
It looks like the problem comes from forgetting to import the function grad? Not a DI thing, presumably an NLPModels thing
@dpo could you perhaps give me acces to the repo so that I may help with this and future PRs?
@gdalle I invited you. Thank you for your work here!!!
@amontoison what do you think about moving away from symbols here?
@amontoison what do you think about moving away from symbols here?
It depends on the alternatives, Right now, it's useful to specify that we want optimized backends with :optimized or only matrix-free backends :matrix_free (no Jacobian or Hessian).
But if Enzyme.jl is stable enough, we could drop :optimized and use a boolean for matrix-free backends.
It will be easier to provide an AutoBackend() with the appropriate options.
If I'm not mistaken there are two levels here:
- the interface you present to the user (
:optimized,:matrix_free) - the way you represent the backends internally
Right now you base all of the internal representations on Symbols. But as explained here, the whole reason for ADTypes was to move beyond Symbols towards full-fledged types that are 1) more expressive and 2) dispatchable. That's why I was suggesting a similar move here. It doesn't stop you from offering :optimized autodiff options in the front end if you like
Do you have an example of what you suggest?
I could try to show you in an alternative PR
Okay it is a bit hard to submit a PR since there would be a lot of things to rewrite and I don't understand what each part does. But essentially I was imagining something like this:
using ADTypes
using DifferentiationInterface
using LinearAlgebra
using SparseMatrixColorings
using SparseConnectivityTracer
import ForwardDiff, ReverseDiff
function DefaultAutoSparse(backend::AbstractADType)
return AutoSparse(
backend;
sparsity_detector=TracerSparsityDetector(),
coloring_algorithm=GreedyColoringAlgorithm(),
)
end
struct ADModelBackend
gradient_backend
hprod_backend
jprod_backend
jtprod_backend
jacobian_backend
hessian_backend
end
struct ADModelBackendPrep
gradient_prep
hprod_prep
jprod_prep
jtprod_prep
jacobian_prep
hessian_prep
end
function ADModelBackend(forward_backend::AbstractADType, reverse_backend::AbstractADType)
@assert ADTypes.mode(forward_backend) isa
Union{ADTypes.ForwardMode,ADTypes.ForwardOrReverseMode}
@assert ADTypes.mode(reverse_backend) isa
Union{ADTypes.ReverseMode,ADTypes.ForwardOrReverseMode}
gradient_backend = reverse_backend
hprod_backend = SecondOrder(forward_backend, reverse_backend)
jprod_backend = forward_backend
jtprod_backend = reverse_backend
jacobian_backend = DefaultAutoSparse(forward_backend) # or a size-dependent heuristic
hessian_backend = DefaultAutoSparse(SecondOrder(forward_backend, reverse_backend))
return ADModelBackend(
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
)
end
function ADModelBackendPrep(
admodel_backend::ADModelBackend,
obj::Function,
cons::Function,
lag::Function,
x::AbstractVector,
)
(;
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
) = admodel_backend
c = cons(x)
λ = similar(c)
dx = similar(x)
dc = similar(c)
gradient_prep = prepare_gradient(lag, gradient_backend, x, Constant(λ))
hprod_prep = prepare_hvp(lag, hprod_backend, x, (dx,), Constant(λ))
jprod_prep = prepare_pushforward(cons, jprod_backend, x, (dx,))
jtprod_prep = prepare_pullback(cons, jtprod_backend, x, (dc,))
jacobian_prep = prepare_jacobian(cons, jacobian_backend, x)
hessian_prep = prepare_hessian(lag, hessian_backend, x, Constant(λ))
return ADModelBackendPrep(
gradient_prep, hprod_prep, jprod_prep, jtprod_prep, jacobian_prep, hessian_prep
)
end
admodel_backend = ADModelBackend(AutoForwardDiff(), AutoReverseDiff())
obj(x) = sum(x)
cons(x) = abs.(x)
lag(x, λ) = obj(x) + dot(λ, cons(x))
admodel_backend_prep = ADModelBackendPrep(admodel_backend, obj, cons, lag, rand(3));