FiniteDiff.jl icon indicating copy to clipboard operation
FiniteDiff.jl copied to clipboard

Suggestion for non-allocating Jacobian

Open danielwe opened this issue 4 years ago • 7 comments

I'm aware of #104 which was closed by #113, but Jacobians still allocate when using StaticArrays, even with a preallocated cache. Here's a barebones solution for a cache-free non-allocating Jacobian:

using StaticArrays
using BenchmarkTools

const RELSTEP = sqrt(eps(Float64))
const ABSSTEP = RELSTEP

function step(x)
    return max(RELSTEP * abs(x), ABSSTEP)
end

function finite_difference_jacobian(f, x)
    return finite_difference_jacobian(f, x, f(x))
end

@generated function finite_difference_jacobian(f, x::SVector{L}, fx) where {L}
    if L > 32
        # dimension too large for non-allocating Jacobian calculation
        return :(finite_difference_jacobian_fallback(f, x, fx))
    end
    ex = Expr(:call, :hcat)
    for i in 1:L
        push!(ex.args, :(finite_difference_jacobian_column(f, x, fx, $i)))
    end
    return ex
end

function finite_difference_jacobian_column(f, x::SVector{L}, fx, i) where {L}
    dxi = (x[i] + step(x[i])) - x[i]  # Floating point accuracy trick https://twitter.com/willkurt/status/1330183861452541953
    xdx = SVector{L}(j == i ? x[j] + dxi : x[j] for j in 1:L)
    return (f(xdx) - fx) / dxi
end

f(x) = @SVector [x[2]^3, x[1]^2, x[1] - 3x[2]]
x = @SVector rand(2)

println("FiniteDiff.finite_difference_jacobian:")
using FiniteDiff; cache = FiniteDiff.JacobianCache(x)
@btime FiniteDiff.finite_difference_jacobian(f, $x, $cache)

println("Custom finite_difference_jacobian:")
@btime finite_difference_jacobian(f, Ref($x)[])

Output:

FiniteDiff.finite_difference_jacobian:
  272.326 ns (21 allocations: 624 bytes)
Custom finite_difference_jacobian:
  15.568 ns (0 allocations: 0 bytes)

Could a specialization along these lines for small SVector problems be relevant for ForwardDiff? Unfortunately, I don't have the time to dig into the codebase and make a PR right now.

I had to use a generated function to hcat an arbitrary number of columns without splatting a generator, which apparently always allocates. Let me know if you know a better solution.


Edit: Included an ABSSTEP. Edit 2: Fixed benchmark that was optimized away.

danielwe avatar Nov 25 '20 01:11 danielwe

Wow 0.018 ns is faster than lightning ! As Admiral Grace Hopper was fond of saying In one nanosecond light travels about 0.3 meters . The only thing I can guess that is that fast is CPU registry memory ? Maybe that is why the check for " if L > 32 " works ? But my CPU L1 memory cache is much wider than 32 , so hmmm ?

And about "I had to use a generated function to hcat .. which apparently always allocates. Let me know if you know a better solution." Suggest Take a look at >> my example using RuntimeGeneratedFunction at this link >> https://discourse.julialang.org/t/setting-approxfun-space-tried-several-tricks-but-still-receive-error-loaderror-cannot-infer-spaces-specifically-when-using-generalizedgenerated-jl-for-generalized-generated-functions/48909/17?u=marc.cox

using RuntimeGeneratedFunction
. . .
julia> @btime f17(7)
  26.122 ns (1 allocation: 16 bytes)
7.000000000000001

Marc-Cox-08 avatar Nov 25 '20 02:11 Marc-Cox-08

When @btime reports sub-nanosecond, that implies that the actual computation is optimized away by the compiler. The correct benchmark should be

julia> @btime finite_difference_jacobian(f, Ref($x)[])
  12.866 ns (0 allocations: 0 bytes)
3×2 SMatrix{3, 2, Float64, 6} with indices SOneTo(3)×SOneTo(2):
 0.0        1.01871
 0.298806   0.0
 1.0       -3.0

YingboMa avatar Nov 25 '20 03:11 YingboMa

Thanks, benchmark updated! 0.018 ns did seem a little too good to be true, although I guess it isn't a bad sign per se when the compiler is able to optimize away the whole thing.

I set the upper bound to L = 32 because during my initial prototyping the function seemed to start allocating when passing more than 32 arguments to hcat. But when testing the current implementation the largest non-allocating value seems to be L = 34, go figure.

For L > 34 this implementation is still faster and allocates less than ForwardDiff's, but I guess you gotta stop somewhere. I don't know how many versions of the generated function one should allow. In fact, I'm not sure I understand how generated functions work at all. I was expecting it to compile a new method for each L, such that, for example, after calling the function on xs of lengths 2, 3, and 7 one would get something like this:

julia> methods(finite_difference_jacobian)
# 4 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{2},T,1,2} where T, fx) in Main at <filename>:15
[2] finite_difference_jacobian(f, x::SArray{Tuple{3},T,1,3} where T, fx) in Main at <filename>:15
[3] finite_difference_jacobian(f, x::SArray{Tuple{7},T,1,7} where T, fx) in Main at <filename>:15
[4] finite_difference_jacobian(f, x) in Main at <filename>:11

But that's not what happens, you only ever see this:

julia> methods(finite_difference_jacobian)
# 2 methods for generic function "finite_difference_jacobian":
[1] finite_difference_jacobian(f, x::SArray{Tuple{L},T,1,L} where T, fx) where L in Main at <filename>:15
[2] finite_difference_jacobian(f, x) in Main at <filename>:11

danielwe avatar Nov 25 '20 04:11 danielwe

But that's not what happens, you only ever see this:

It only shows you that, but it's still specializing on each one.

This is a good idea and we should include such a specialization.

ChrisRackauckas avatar Nov 25 '20 09:11 ChrisRackauckas

This is great! Do you mind to make PR and add some tests?

YingboMa avatar Nov 25 '20 16:11 YingboMa

I'll get to it eventually, but it doesn't look like it's gonna be a straight copy&paste and it's gonna be a couple of days before I'm able to sit down and familiarize myself with the existing code. If anyone else wants to take a stab in the meantime, feel free---that's why I wanted to share the example right away.

danielwe avatar Nov 25 '20 20:11 danielwe

Hi Daniel @danielwe I'll see if I get a chance to take a stab at it. It will probably be more efficient if you elaborate about why " it doesn't look like it's gonna be a straight copy&paste ", And I'm disinclined to a lot of public back and forth in Github so feel free to message me on Discourse https://discourse.julialang.org/u/marc.cox/summary

Marc-Cox-08 avatar Nov 26 '20 05:11 Marc-Cox-08