ForwardDiff.jl icon indicating copy to clipboard operation
ForwardDiff.jl copied to clipboard

Calculating J_f(x) . y efficiently

Open dpsanders opened this issue 7 years ago • 10 comments

As far as I understand, if you have a function f:R^n -> R^n, there should be a way to calculate J_f(x) . y (Jacobian of f at point x, multiplied by the vector y) more efficiently than calculating the whole matrix and then doing the multiplication.

How can this be done using ForwardDiff?

dpsanders avatar Apr 24 '18 23:04 dpsanders

It's similar to the classic Hessian-vector product trick, just without the nested differentiation:

julia> using ForwardDiff

julia> x, y = rand(3), rand(3)
([0.243633, 0.330416, 0.632913], [0.0525042, 0.381907, 0.738228])

julia> duals = map(ForwardDiff.Dual, x, y)
3-element Array{ForwardDiff.Dual{Void,Float64,1},1}:
 Dual{Void}(0.243633,0.0525042)
 Dual{Void}(0.330416,0.381907)
 Dual{Void}(0.632913,0.738228)

julia> ForwardDiff.partials.(cumprod(duals), 1)
3-element Array{Float64,1}:
 0.0525042
 0.110393
 0.129297

julia> ForwardDiff.jacobian(cumprod, x) * y
3-element Array{Float64,1}:
 0.0525042
 0.110393
 0.129297

Supporting this comes down to either having a built-in jacvec method or more generally, an API for custom perturbation seeding (https://github.com/JuliaDiff/ForwardDiff.jl/issues/43).

jrevels avatar Apr 25 '18 20:04 jrevels

There could be jacvec and gradvec operations, or even have "lazy" Jacobian/Gradient objects (e.g. so you could apply IterativeSolvers.jl directly to the object).

simonbyrne avatar Mar 05 '19 18:03 simonbyrne

@jrevels Can you explain this code? I know this implements what is said in #263 I am newcomer to Julia and want to implement this for a function f'(x) * v.

Especially the line below is not very clear

julia> ForwardDiff.partials.(cumprod(duals), 1)

What does this partials, 1 do?

pavanakumar avatar Jun 06 '20 13:06 pavanakumar

SparseDiffTools.jl provides a JacVec object which does this.

simonbyrne avatar Sep 01 '20 16:09 simonbyrne

And exposes the functions:

https://github.com/JuliaDiff/SparseDiffTools.jl#jacobian-vector-and-hessian-vector-products

You can see how they're implemented here: https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/differentiation/jaches_products.jl#L3-L13

ChrisRackauckas avatar Sep 01 '20 16:09 ChrisRackauckas

I was playing around, and this gets really easy with StructArrays.jl:

function ForwardDiff.value(dx::StructArray{D}) where {D<:ForwardDiff.Dual}
    dx.value
end
function ForwardDiff.partials(dx::StructArray{D}, i) where {D<:ForwardDiff.Dual}
    getproperty(dx.partials.values,i)
end

function StructDual(x::AbstractVector{T}, w::AbstractVector{T}) where {T}
    @assert length(x) == length(w)
    partials = StructArray{ForwardDiff.Partials{1,T}}(
        (StructArray{Tuple{T}}(
            (w,)
        ),)
    )
    duals = StructArray{ForwardDiff.Dual{Nothing,T,1}}(
        (x, partials)
    )
    return duals
end

then you can do stuff like

x = rand(10)
w = rand(10)
dx = StructDual(x,w)
dy = similar(dx)
f!(dy, dx)
FowardDiff.value(dy)
FowardDiff.partials(dy,1)

which lets you avoid copying to and from arrays of structs.

simonbyrne avatar Sep 01 '20 17:09 simonbyrne

The real win would be operating the AD at the array level though, which ForwardDiff doesn't do. That's the reason for the newer AD systems.

ChrisRackauckas avatar Sep 01 '20 17:09 ChrisRackauckas

Why would that be better?

simonbyrne avatar Sep 01 '20 17:09 simonbyrne

It would use BLAS for example

ChrisRackauckas avatar Sep 01 '20 17:09 ChrisRackauckas

Ah, ok. I was considering this in the case of matrix-free operators, but yes if you're doing array operations you would probably want something more array-aware.

simonbyrne avatar Sep 01 '20 17:09 simonbyrne