Function as first argument?
Is there a specific reason for making the function f the second argument? It seems a bit unfortunate that this makes it impossible to use the do syntax which one could use if f would be the first argument such as in ForwardDiff or Zygote (as suggested also by the Julia style guide).
gradient(ba, f, args...) just looks cleaner than gradient(f, ba, args...), since it puts the functions with its args. Also, in general these functions should dispatch on the backend, not f, so it's cleaner it be the first argument. There's other precedence for this as well in AD packages (see e.g. ChainRulesCore.rrule_via_ad).
Zygote is able to get away with this since the user never needs to provide a config object. Same with ForwardDiff.
But it would be better to support do syntax. To that point, I wonder if it would make sense to support alternate syntax for the API functions, where the backend is a keyword argument. e.g.
gradient(f, xs...; backend) = gradient(backend, f, xs...)
Implemented @sethaxen's suggestion in https://github.com/JuliaDiff/AbstractDifferentiation.jl/pull/62
What about a curried form?
gradient(ab::AbstractBackend) = FixBackend(ab, gradient)
(op::FixBackend)(f, xs...) = op.∂(op.backend, f, xs...)
Then one can write
gradient(SomeBackend())(a, b) do x, y
...
end
@phipsgabler what do you see as the pros and cons of the curried form vs the one in https://github.com/JuliaDiff/AbstractDifferentiation.jl/issues/33#issuecomment-1018923309?
Purely subjectively, I like the looks of it better. The backend argument looks like something that should be more privileged than a mere "option hidden in a kwarg", IMHO.
Also (and this would be a question practitioners): isn't fixing the backend and then using the same differential operator repeatedly within a package a common use case?
IMO the curried form would be nice, mainly since as mentioned above it would make it possible to fix a backend in a straightforward way without having to create anonymous functions. Also I like that keyword arguments would be reserved only for backend-specific options (not sure if that's a thing at all currently) and that the two versions of ˋgradientˋ would be more distinct.