StrideArraysCore.jl
StrideArraysCore.jl copied to clipboard
`freeze/thaw` interface for `StaticStrideArray`s
Currently, a StaticStrideArray must be a mutable struct so that we can get reliable pointers to it's storage in order to avoid interacting with the NTuple data.
However, this causes problems with StaticStrideArrays if they cross non-inlined function boundaries because then the storage data will end up on the heap even if it doesn't need to be there, requiring us to deal with the scariness of PtrArrays @gc_preserve.
One possible solution to this problem would be to have StaticStrideArray be an immutable struct, and then have a MutableStaticStrideArray that can be created from it on-demand in order to get a pointer, and then converted right back to the immutable version as soon as we're done.
Here's a little demo of what I'm imagining:
using StrideArrays, StrideArraysCore
using StrideArraysCore: StaticStrideArray
macro new(T, args...)
Expr(:new, T, args...) |> esc
end
struct FrozenStaticStrideArray{T, N, R, S, X, O, L} <: AbstractStrideArray{T, N, R, S, X, O}
data::NTuple{L,T}
end
function freeze(sa::StaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
@new FrozenStaticStrideArray{T, N, R, S, X, O, L} getfield(sa, 1)
end
function thaw(fsa::FrozenStaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
@new StaticStrideArray{T, N, R, S, X, O, L} getfield(fsa, 1)
end
function Base.getindex(fsa::FrozenStaticStrideArray, i::Int...)
thaw(fsa)[i...]
end
let
sa = StaticStrideArray{Float64}(undef, (static(1000),)) .= 1
fsa = freeze(sa)
@btime $sa[1000]
@btime $fsa[1000]
endl
#+RESULTS:
: 1.940 ns (0 allocations: 0 bytes)
: 1.939 ns (0 allocations: 0 bytes)
I think having an immutable variant like this could make this a much more convincing replacement for StaticArrays.jl.
I am in favor, please feel free to make a PR!
@ChrisRackauckas and I have been talking for some time about making a StaticArrays.jl replacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems that StaticArrays.jl has.
I'm also in favor of making StrideArrays(Core) that library.
It should perhaps also be updated to make use of the new Pkg extensions.
Hm, it seems that julia is being a bit overly cautious about avoiding us mutating a region of stack memory that belonged to an immutable variable, so a freeze/thaw is not as cheap an operation as one might hope. All the memory needs to be moved:
julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
fv = freeze(v)
@btime freeze($(thaw(fv)))
end;
153.136 ns (0 allocations: 0 bytes)
julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
fv = freeze(v)
code_llvm(Tuple{typeof(fv)}) do fv
freeze(thaw(fv))
end
end;
; @ REPL[8]:4 within `#3`
define void @"julia_#3_2881"([1 x [1000 x double]]* noalias nocapture sret([1 x [1000 x double]]) %0, [1 x [1000 x double]]* nocapture nonnull readonly align 8 dereferenceable(8000) %1) #0 {
top:
; ┌ @ REPL[5]:2 within `freeze`
%.sroa.0.0..sroa_cast = bitcast [1 x [1000 x double]]* %1 to i8*
%2 = bitcast [1 x [1000 x double]]* %0 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(8000) %2, i8* noundef nonnull align 8 dereferenceable(8000) %.sroa.0.0..sroa_cast, i64 8000, i1 false)
; └
ret void
}
Not a show-stopper, but having a cost to freezing and thawing does put a damper on things...
Yes, we're aware of that problem. A few people looked at it and couldn't come up with simple fixes, but we think it's still worth it despite this cost (and eventually, compiler improvements should eliminate that copy).
Okay, I'll see what I can do. What is your appetite here for a breaking version?
I'm picturing replacing what is currently StaticStrideArray with a StrideArray wrapping a Ref{<:NTuple}, i.e.
function StrideArray{T}(::UndefInitializer, sz::StaticInt...) where {T}
r = Ref{NTuple{Int(prod(sz)), T}}
p = Ptr{T}(pointer_from_objref(r))
pA = PtrArray(p, sz)
return StrideArray(pA, r)
end
and then the name StaticStrideArray would then be used for the immutable version. Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?
I'm okay with a breaking version, as there's only a handful of packages I maintain depending on this, and none use StaticStrideArray, so it won't be too much work to update.
Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?
I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance. I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.
I think I made an issue for this in ArrayInterface a while ago
a
StaticArrays.jlreplacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems thatStaticArrays.jlhas
How would this be different from a SizedArray?
I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance. I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.
Sorry, I don't remember. Could have been Zulip.
How would this be different from a
SizedArray?
It wouldn't be backed by a Vector storage, but instead a type which can be stack allocated.
Ah right, of course. Thanks @MasonProtter