ForwardDiff.jl
ForwardDiff.jl copied to clipboard
Calculating J_f(x) . y efficiently
As far as I understand, if you have a function f:R^n -> R^n, there should be a way to calculate J_f(x) . y (Jacobian of f at point x, multiplied by the vector y) more efficiently than calculating the whole matrix and then doing the multiplication.
How can this be done using ForwardDiff?
It's similar to the classic Hessian-vector product trick, just without the nested differentiation:
julia> using ForwardDiff
julia> x, y = rand(3), rand(3)
([0.243633, 0.330416, 0.632913], [0.0525042, 0.381907, 0.738228])
julia> duals = map(ForwardDiff.Dual, x, y)
3-element Array{ForwardDiff.Dual{Void,Float64,1},1}:
Dual{Void}(0.243633,0.0525042)
Dual{Void}(0.330416,0.381907)
Dual{Void}(0.632913,0.738228)
julia> ForwardDiff.partials.(cumprod(duals), 1)
3-element Array{Float64,1}:
0.0525042
0.110393
0.129297
julia> ForwardDiff.jacobian(cumprod, x) * y
3-element Array{Float64,1}:
0.0525042
0.110393
0.129297
Supporting this comes down to either having a built-in jacvec method or more generally, an API for custom perturbation seeding (https://github.com/JuliaDiff/ForwardDiff.jl/issues/43).
There could be jacvec and gradvec operations, or even have "lazy" Jacobian/Gradient objects (e.g. so you could apply IterativeSolvers.jl directly to the object).
@jrevels Can you explain this code? I know this implements what is said in #263 I am newcomer to Julia and want to implement this for a function f'(x) * v.
Especially the line below is not very clear
julia> ForwardDiff.partials.(cumprod(duals), 1)
What does this partials, 1 do?
SparseDiffTools.jl provides a JacVec object which does this.
And exposes the functions:
https://github.com/JuliaDiff/SparseDiffTools.jl#jacobian-vector-and-hessian-vector-products
You can see how they're implemented here: https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/differentiation/jaches_products.jl#L3-L13
I was playing around, and this gets really easy with StructArrays.jl:
function ForwardDiff.value(dx::StructArray{D}) where {D<:ForwardDiff.Dual}
dx.value
end
function ForwardDiff.partials(dx::StructArray{D}, i) where {D<:ForwardDiff.Dual}
getproperty(dx.partials.values,i)
end
function StructDual(x::AbstractVector{T}, w::AbstractVector{T}) where {T}
@assert length(x) == length(w)
partials = StructArray{ForwardDiff.Partials{1,T}}(
(StructArray{Tuple{T}}(
(w,)
),)
)
duals = StructArray{ForwardDiff.Dual{Nothing,T,1}}(
(x, partials)
)
return duals
end
then you can do stuff like
x = rand(10)
w = rand(10)
dx = StructDual(x,w)
dy = similar(dx)
f!(dy, dx)
FowardDiff.value(dy)
FowardDiff.partials(dy,1)
which lets you avoid copying to and from arrays of structs.
The real win would be operating the AD at the array level though, which ForwardDiff doesn't do. That's the reason for the newer AD systems.
Why would that be better?
It would use BLAS for example
Ah, ok. I was considering this in the case of matrix-free operators, but yes if you're doing array operations you would probably want something more array-aware.