ForwardDiff.jl icon indicating copy to clipboard operation
ForwardDiff.jl copied to clipboard

Is this code a supported use? (single-pass value and derivative)

Open gerlero opened this issue 3 years ago • 6 comments

Considering these two methods that compute the value and first derivative of a scalar function in a single pass:

import ForwardDiff
import DiffResults

@inline function value_and_derivative(f, Y::Type, x::Real)
    diffresult = ForwardDiff.derivative!(DiffResults.DiffResult(zero(Y), zero(Y)), f, x)
    return DiffResults.value(diffresult), DiffResults.derivative(diffresult)
end

@inline function value_and_derivative(f, x::Real)
    T = typeof(ForwardDiff.Tag(f, typeof(x)))
    ydual = f(ForwardDiff.Dual{T}(x, one(x)))
    return ForwardDiff.value(T, ydual), ForwardDiff.extract_derivative(T, ydual)
end
  • The first method seems to be the officially recommended way to do this. However, (1) it introduces the DiffResults dependency just for this, and (2) it needlessly requires the caller to specify f's return type ahead of the call. Neither are dealbreakers, but IMO they add friction for something that looks like it shouldn't have it (intuition says that the value comes for free when computing a derivative with ForwardDiff)

  • The second method uses the Tag and Dual types as well as the extract_derivative function, which are not listed in the "Differentiation API" in the docs, so I'm not sure if they're considered part of the stable public API

Both methods run equally fast (and significantly faster than a naive two-pass implementation), so my question is: does the second method constitute a supported use of ForwardDiff's public API?

If such an use isn't supported (but maybe even if it is so—both method implementations appear too convoluted for a pretty common use case IMO), I'd like to suggest adding a value_and_derivative function with the second method to either this or one of the other packages in JuliaDiff (I'm willing to write a PR).

Related: #401, #391

EDITS: y-> ydual, ydual.value -> ForwardDiff.value(T, ydual), add another related issue

gerlero avatar Nov 24 '22 17:11 gerlero

I can't comment on official API and design questions here, at least not in an official way. However, a quick note on

However, (1) it introduces the DiffResults dependency just for this

DiffResults is a dependency of ForwardDiff, so it's a dependency anyway, regardless of whether you load it or not. You might even be able to load ForwardDiff.DiffResults or ForwardDiff.DiffResult directly.

devmotion avatar Nov 24 '22 18:11 devmotion

DiffResults is a dependency of ForwardDiff, so it's a dependency anyway, regardless of whether you load it or not. You might even be able to load ForwardDiff.DiffResults or ForwardDiff.DiffResult directly.

Thanks for the comment. That means that no extra packages are installed just for this simple use case (which is a good thing!). Unfortunately, it doesn't mean that one can avoid having to explicitly add DiffResults as a dependency (unless this is part of the public API). Honestly, I would care a lot less about this if I were to find a better solution to my other point (putting it another way, if DiffResults were less constraining for this simple case; or if I didn't have to use it at all).

Regarding that other point (i.e., the fact that, when using DiffResults, the return type must be known before the call), I'm thinking that an alternative to my initial suggestion could be to add a ForwardDiff.derivative[!] method that also returns a DiffResult but does not require a DiffResult as an input, for use in this scenario.

EDIT: By that I mean a adding a new method like this:

import ForwardDiff
import DiffResults

@inline function ForwardDiff.derivative!(::Nothing, f, x::Real) # Or just value_and_derivative(f, x::Real)
    T = typeof(ForwardDiff.Tag(f, typeof(x)))
    ydual = f(ForwardDiff.Dual{T}(x, one(x)))
    return DiffResults.DiffResult(ForwardDiff.value(T, ydual), ForwardDiff.extract_derivative(T, ydual))
end

The ::Nothing parameter used for dispatch could be replaced with some other type (even some kind of "empty" DiffResults object) for the same purpose.

~~Note: a quick (and definitely non-scientific) benchmark I did showed that wrapping the value and derivative in a DiffResult object carries an overhead, so I'd still prefer to use a function that just returns a plain tuple.~~ (After running the same benchmark again many times, I no longer see a difference).

gerlero avatar Nov 24 '22 19:11 gerlero

I've found a couple of implementations of the second method in SciML packages, so there's definitely demand for such a method, as well as some precedent of treating Dual (and related types/functions) as part of ForwardDiff's API.

gerlero avatar Nov 28 '22 23:11 gerlero

Small caveat is that SimpleNonLinearSolve was extracted and to a large extent copied from NonLinearSolve just some days ago, so the links are basically a single example of the second method.

devmotion avatar Nov 28 '22 23:11 devmotion

I am doing something similar in a private package. I don’t think it’s official API, so I just have a few tests that will be indicative when things break.

I guess, whether you are fine with something like this depends on your risk persona and application area.

thomvet avatar Dec 18 '22 06:12 thomvet

@thomvet Well, I ended up doing the same. I still hope this type of usage can be included in the public API (or clarified as such if it's already meant to be API) so as to be able to avoid future breakage.

gerlero avatar Mar 06 '23 16:03 gerlero