SparseDiffTools.jl
SparseDiffTools.jl copied to clipboard
In-place `*_jacvec!` assumes square jacobian
-
auto_jacvec!
andnum_jacvec!
only work when the output and input dimensions are the same, i.e., the jacobian is square. The issue is this line (and the equivalent fornum_jacvec!
), which should probably use both the length and eltype ofdy
.
https://github.com/JuliaDiff/SparseDiffTools.jl/blob/2dbe4206d107d55538eb72b33aca727169130849/src/differentiation/jaches_products.jl#L10
-
Manual cache allocation works but is surprisingly hard to get right, the implicit prescription in the README gets the wrong tag type for the duals: https://github.com/JuliaDiff/SparseDiffTools.jl#jacobian-vector-and-hessian-vector-products
-
The corresponding lines in
JacVec
should similarly be fixed: https://github.com/JuliaDiff/SparseDiffTools.jl/blob/2dbe4206d107d55538eb72b33aca727169130849/src/differentiation/jaches_products.jl#L201
- Besides, shouldn't
auto_jacvec(!)
be upstreamed to ForwardDiff? Strange to have to pull in a sparse-related package to get this functionality. JuliaDiff/ForwardDiff.jl#319
MWE:
using SparseDiffTools
function f!(dx, x)
dx[1] = x[2] + x[3]
dx[2] = x[1]
end
x = randn(3)
v = randn(3)
dx = zeros(2)
auto_jacvec!(dx, f!, x, v)
I think manual cache allocation is what needs to be done, because that's how the output vector would then be defined. Does that make this a documentation issue?
Besides, shouldn't auto_jacvec(!) be upstreamed to ForwardDiff? Strange to have to pull in a sparse-related package to get this functionality. https://github.com/JuliaDiff/ForwardDiff.jl/issues/319
Possibly, or it should be an AbstractDifferentation.jl interface piece, and all AD packages should then adopt that interface.
A documentation fix would definitely make it easier to preallocate, which is useful regardless, but I think it's possible to make the allocating version work too. Just replace cache2 = similar(cache1)
with
cache2 = Dual{typeof(ForwardDiff.Tag(DeivVecTag(),eltype(dy))),eltype(dy),1}.(dy, ForwardDiff.Partials.(tuple.(dy))),
Here's the adaptation I landed on for my own use. I factored out the cache allocation into a function that can be used separately, making preallocation easy. If upstreamed to ForwardDiff this would of course turn into something like a ForwardDiff.JVPConfig
type.
struct MyTag end
function jvp!(f!, dy, x, v, duals=nothing)
if duals === nothing
duals = jvpduals(dy, v, x)
else # jvpduals returns initialized duals, so this init is only needed when preallocated
duals.x .= ForwardDiff.Dual{_tagtype(x), eltype(x), 1}.(
x, ForwardDiff.Partials.(tuple.(v))
)
end
f!(duals.dy, duals.x)
dy .= ForwardDiff.partials.(duals.dy, 1)
return dy
end
function jvpduals(dy, x, v)
dualdy = ForwardDiff.Dual{_tagtype(dy), eltype(dy), 1}.(
dy, ForwardDiff.Partials.(tuple.(dy))
)
# This correctly initializes dualx for computing J_f(x) * v
dualx = ForwardDiff.Dual{_tagtype(x), eltype(x), 1}.(
x, ForwardDiff.Partials.(tuple.(v))
)
return (; dy=dualdy, x=dualx)
end
_tagtype(x) = typeof(ForwardDiff.Tag(MyTag(), eltype(x)))
That does seem like a good fix. Mind opening a PR?
Been procrastinating on this, since my own version is still developing along with my project (including things like allocation-free 5-arg accumulating jvp, and consistent treatment of in-place vs. immutable versions), and I notice that the landscape of linear operator types is in rapid flux at the moment, at least on the SciML end of things (stoked for https://github.com/SciML/SciMLOperators.jl :fire:). So unless an improvement on this is urgent for anyone else I'll hold off a little and come back when it's more clear how things should look.
There's been a lot of motion in this ocean in the last few weeks. Indeed, This change might want to wait until the end of the month until SciMLOperators is complete and registered.