FiniteDiff.jl
FiniteDiff.jl copied to clipboard
Suggestion for non-allocating Jacobian
I'm aware of #104 which was closed by #113, but Jacobians still allocate when using StaticArrays, even with a preallocated cache. Here's a barebones solution for a cache-free non-allocating Jacobian:
using StaticArrays
using BenchmarkTools
const RELSTEP = sqrt(eps(Float64))
const ABSSTEP = RELSTEP
function step(x)
return max(RELSTEP * abs(x), ABSSTEP)
end
function finite_difference_jacobian(f, x)
return finite_difference_jacobian(f, x, f(x))
end
@generated function finite_difference_jacobian(f, x::SVector{L}, fx) where {L}
if L > 32
# dimension too large for non-allocating Jacobian calculation
return :(finite_difference_jacobian_fallback(f, x, fx))
end
ex = Expr(:call, :hcat)
for i in 1:L
push!(ex.args, :(finite_difference_jacobian_column(f, x, fx, $i)))
end
return ex
end
function finite_difference_jacobian_column(f, x::SVector{L}, fx, i) where {L}
dxi = (x[i] + step(x[i])) - x[i] # Floating point accuracy trick https://twitter.com/willkurt/status/1330183861452541953
xdx = SVector{L}(j == i ? x[j] + dxi : x[j] for j in 1:L)
return (f(xdx) - fx) / dxi
end
f(x) = @SVector [x[2]^3, x[1]^2, x[1] - 3x[2]]
x = @SVector rand(2)
println("FiniteDiff.finite_difference_jacobian:")
using FiniteDiff; cache = FiniteDiff.JacobianCache(x)
@btime FiniteDiff.finite_difference_jacobian(f, $x, $cache)
println("Custom finite_difference_jacobian:")
@btime finite_difference_jacobian(f, Ref($x)[])
Output:
FiniteDiff.finite_difference_jacobian:
272.326 ns (21 allocations: 624 bytes)
Custom finite_difference_jacobian:
15.568 ns (0 allocations: 0 bytes)
Could a specialization along these lines for small SVector problems be relevant for ForwardDiff? Unfortunately, I don't have the time to dig into the codebase and make a PR right now.
I had to use a generated function to hcat
an arbitrary number of columns without splatting a generator, which apparently always allocates. Let me know if you know a better solution.
Edit: Included an ABSSTEP
.
Edit 2: Fixed benchmark that was optimized away.
Wow 0.018 ns is faster than lightning ! As Admiral Grace Hopper was fond of saying In one nanosecond light travels about 0.3 meters . The only thing I can guess that is that fast is CPU registry memory ? Maybe that is why the check for " if L > 32 " works ? But my CPU L1 memory cache is much wider than 32 , so hmmm ?
And about "I had to use a generated function to hcat .. which apparently always allocates. Let me know if you know a better solution." Suggest Take a look at >> my example using RuntimeGeneratedFunction at this link >> https://discourse.julialang.org/t/setting-approxfun-space-tried-several-tricks-but-still-receive-error-loaderror-cannot-infer-spaces-specifically-when-using-generalizedgenerated-jl-for-generalized-generated-functions/48909/17?u=marc.cox
using RuntimeGeneratedFunction
. . .
julia> @btime f17(7)
26.122 ns (1 allocation: 16 bytes)
7.000000000000001
When @btime
reports sub-nanosecond, that implies that the actual computation is optimized away by the compiler. The correct benchmark should be
julia> @btime finite_difference_jacobian(f, Ref($x)[])
12.866 ns (0 allocations: 0 bytes)
3×2 SMatrix{3, 2, Float64, 6} with indices SOneTo(3)×SOneTo(2):
0.0 1.01871
0.298806 0.0
1.0 -3.0
Thanks, benchmark updated! 0.018 ns did seem a little too good to be true, although I guess it isn't a bad sign per se when the compiler is able to optimize away the whole thing.
I set the upper bound to L = 32 because during my initial prototyping the function seemed to start allocating when passing more than 32 arguments to hcat. But when testing the current implementation the largest non-allocating value seems to be L = 34, go figure.
For L > 34 this implementation is still faster and allocates less than ForwardDiff's, but I guess you gotta stop somewhere. I don't know how many versions of the generated function one should allow. In fact, I'm not sure I understand how generated functions work at all. I was expecting it to compile a new method for each L, such that, for example, after calling the function on x
s of lengths 2, 3, and 7 one would get something like this:
julia> methods(finite_difference_jacobian)
# 4 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{2},T,1,2} where T, fx) in Main at <filename>:15
[2] finite_difference_jacobian(f, x::SArray{Tuple{3},T,1,3} where T, fx) in Main at <filename>:15
[3] finite_difference_jacobian(f, x::SArray{Tuple{7},T,1,7} where T, fx) in Main at <filename>:15
[4] finite_difference_jacobian(f, x) in Main at <filename>:11
But that's not what happens, you only ever see this:
julia> methods(finite_difference_jacobian)
# 2 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{L},T,1,L} where T, fx) where L in Main at <filename>:15
[2] finite_difference_jacobian(f, x) in Main at <filename>:11
But that's not what happens, you only ever see this:
It only shows you that, but it's still specializing on each one.
This is a good idea and we should include such a specialization.
This is great! Do you mind to make PR and add some tests?
I'll get to it eventually, but it doesn't look like it's gonna be a straight copy&paste and it's gonna be a couple of days before I'm able to sit down and familiarize myself with the existing code. If anyone else wants to take a stab in the meantime, feel free---that's why I wanted to share the example right away.
Hi Daniel @danielwe I'll see if I get a chance to take a stab at it. It will probably be more efficient if you elaborate about why " it doesn't look like it's gonna be a straight copy&paste ", And I'm disinclined to a lot of public back and forth in Github so feel free to message me on Discourse https://discourse.julialang.org/u/marc.cox/summary