PreallocationTools.jl
PreallocationTools.jl copied to clipboard
Allocations when using dualcache
Perhaps I'm doing something wrong in how I set things up, but dualcache seems to always generate allocations when requesting the dual portion of the cache. Here's the most minimal example I could think of; I posted a slightly less minimal example here. Writing my own (simpler, less flexible) version of get_tmp eliminates the issue. The following code:
using ForwardDiff, PreallocationTools
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
return dc.du
end
function test()
ChunkSize = 12
for n in (10, 20)
println("\nTesting for n=", n)
A = zeros(n);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
print("... get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... get_tmp on Vector{Dual}\n")
a = @allocated get_tmp(cache, DA); println(a)
print("... my_get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... my_get_tmp on Vector{Dual}\n")
a = @allocated my_get_tmp(cache, DA); println(a)
end
end
test()
gives:
julia> test()
Testing for n=10
... get_tmp on Vector{Float64}
0
... get_tmp on Vector{Dual}
1168
... my_get_tmp on Vector{Float64}
0
... my_get_tmp on Vector{Dual}
0
Testing for n=20
... get_tmp on Vector{Float64}
0
... get_tmp on Vector{Dual}
2240
... my_get_tmp on Vector{Float64}
0
... my_get_tmp on Vector{Dual}
0
I.e. results in allocations that scale with the vector size and are slightly larger than what I would think would be needed (8*Chunksize*n=960 for n=10 and 1920 for n=20). Using Julia v1.7.3, PreallocationTools v0.4.0, on Mac OSX 12.5.
https://github.com/SciML/PreallocationTools.jl/blob/v0.4.0/src/PreallocationTools.jl#L57
The view
probably is allocating? Try https://github.com/JuliaArrays/UnsafeArrays.jl
This was supposed to be solved: https://github.com/JuliaLang/julia/pull/34126 https://github.com/JuliaLang/julia/issues/14955
If using a simple reshape()
instead of the ArrayInterfaceCore.restructure()
the allocations disappear. Copy-paste version below:
using ForwardDiff, PreallocationTools
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
return dc.du
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::AbstractArray{T}) where T<:ForwardDiff.Dual
nelem = div(sizeof(T), sizeof(eltype(dc.dual_du)))*length(dc.du)
if nelem > length(dc.dual_du)
enlargedualcache!(dc, nelem)
end
reshape(reinterpret(T, view(dc.dual_du, 1:nelem)),size(dc.du))
end
function test()
ChunkSize = 12
for n in (10, 20)
println("\nTesting for n=", n)
A = zeros(n);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
print("... get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... get_tmp on Vector{Dual}\n")
a = @allocated get_tmp(cache, DA); println(a)
print("... my_get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... my_get_tmp on Vector{Dual}\n")
a = @allocated my_get_tmp(cache, DA); println(a)
print("... get_tmp2 on Vector{Dual}\n")
a = @allocated get_tmp2(cache, DA); println(a)
end
end
test()
Removing the view was the first thing I did when I tried to write my own, it didn't help:
using ForwardDiff, PreallocationTools, ArrayInterfaceCore
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
ArrayInterfaceCore.restructure(dc.du, reinterpret(T, dc.dual_du))
end
function test()
ChunkSize = 12
n = 10
A = zeros(n);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
print("... get_tmp on Vector{Dual}\n")
a = @allocated get_tmp(cache, DA); println(a)
print("... get_tmp2 on Vector{Dual}\n")
a = @allocated get_tmp2(cache, DA); println(a)
print("... my_get_tmp on Vector{Dual}\n")
a = @allocated my_get_tmp(cache, DA); println(a)
end
test()
gives:
... get_tmp on Vector{Dual}
1168
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0
Since I didn't know all of what's going on under the hood in restructure I switched it to reshape and voila.
Is restructure changing the view to an array?
I checked with earlier versions of PreallocationTools.jl (down to 0.2.4) where it was still ArrayInterface 5.y.z; the allocations also happen there.
And is it because of changing a view to an Array?
Is this the right way to test it?
using ArrayInterfaceCore
a = zeros(10)
ArrayInterfaceCore.restructure(a, reinterpret(Float32, view(a, 1:5)))
Then yes:
10-element Vector{Float32}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
There's the allocation then. Does Adapt.jl's adapt
allocate here? Maybe that's the primitive that should be used. Otherwise we could specialize restructure
to keep the view here.
I’ll have to look some more into it, but it looks like restructure()
always allocates for non-Base.Array. Maybe it needs a general fix?
Regarding adapt()
I have to play around more and understand better, but at the moment I don’t see how it can deliver a “reshape” (it just takes care of the wrappers?).
My proposal right now would be to do a fix for normal Arrays (simply doing a reshape, as I did above; same as dbstein’s version, but keeping the view) and care about the more advanced AbstractArrays later (ArrayPartition and friends).
CuArray would still need adapt though, right? (I don’t have a readily accessible Cuda GPU at hand to test).
Yeah we can create a specialized dispatch on Array for now. I think in general it cannot just be a reshape because yes, it may need to be type matching in which case the view would get converted.
Just a note here to say that I am playing around on this. I hope to have a PR with a (at least partial) fix today or tomorrow.
Thanks for the fast work on it. Writing non-allocating Jacobians for complicated functions (or even allocating ones, for that matter) is a huge pain, and getting them for free is nothing short of magical.
https://github.com/SciML/PreallocationTools.jl/pull/32 fixes the allocations for Base Arrays now. However, I couldn't get rid of the reshape(reinterpret(...view(...))
wrappers and stay allocation free.
That said, I don't know whether the wrappers interfere with anything downstream negatively. At least in the cases that I use PreallocationTools for, it didn't cause a performance loss (actually, a small gain due to the allocations being gone).
I tested with Adapt, but also with unsafe_wrap (as a last resort):
function _restructure(normal_cache::Array, duals)
unsafe_wrap(Array, pointer(duals), size(normal_cache))
end
That gets rid of the wrappers and returns a correctly sized Array with duals, but it also causes allocations (because the view() is also removed?).
Finally, the more complex AbstractArray cases still allocate.
@chriselrod do you know if there's a more general solution here?
I got:
julia> test()
... get_tmp on Vector{Dual}
0
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0
get_tmp2
, which calls ArrayInterface.restructure(::Array, ::ReinterpretArray)
calls convert(Array, ::ReinterpretArray)
, which is why it allocates.
Yes we know. But is it possible to solve?
Yes we know. But is it possible to solve?
Don't make a copy? Why the need for convert? What's wrong with the reinterpret array? Where is restructure being called?
Like, obviously we can avoid these allocations. SimpleChains uses just a single UInt8
buffer vector for all its temporaries, whether Bool,
Float32
, Float64
, or ForwardDiffDual
.
I'd just need more context on the design and why things are the way they are now to suggest an alternative approach.
I am intrigued by this comment; it might be simpler to adapt this approach here as well. I don't think it will end up in a general solution for any AbstractArray type, though, but maybe it's a start towards one?
The main issue is that it's a missing primitive, so it needs something in ArrayInterfaceCore to capture it. There is no primitive for "restructure but non-allocating". restructure
is supposed to be type-matching: if you started with an ArrayPartition
of Array
, then after you restructure
it should turn a Vector
into an ArrayPartition
of Array
. @chriselrod 's comment is of course that if you want that to be non-allocating, that needs to be an ArrayPartition
of SubArray
, which is correct, but we don't have a primitive operation for "create a type that's like X but not exactly matching X, where the differences that are allowed are shortcuts that improve the performance".
I tend to think that this is a more fundamental issue that SubArray -> Array should just be a pointer thing and shouldn't actually allocate a new array, and I'm not sure why that is true in Julia, and so it would probaby be best to just get that fixed. But this notion of restructuring is actually quite important to PreallocationTools, in fact, it's how the current DiffCache works, so having some verbiage to ask for reshaping functionality to do this kind of stuff is likely useful in many places as long as this SubArray -> Array allocation exists.