StrideArraysCore.jl icon indicating copy to clipboard operation
StrideArraysCore.jl copied to clipboard

`freeze/thaw` interface for `StaticStrideArray`s

Open MasonProtter opened this issue 2 years ago • 9 comments

Currently, a StaticStrideArray must be a mutable struct so that we can get reliable pointers to it's storage in order to avoid interacting with the NTuple data.

However, this causes problems with StaticStrideArrays if they cross non-inlined function boundaries because then the storage data will end up on the heap even if it doesn't need to be there, requiring us to deal with the scariness of PtrArrays @gc_preserve.

One possible solution to this problem would be to have StaticStrideArray be an immutable struct, and then have a MutableStaticStrideArray that can be created from it on-demand in order to get a pointer, and then converted right back to the immutable version as soon as we're done.

Here's a little demo of what I'm imagining:

using StrideArrays, StrideArraysCore
using StrideArraysCore: StaticStrideArray

macro new(T, args...)
    Expr(:new, T, args...) |> esc
end

struct FrozenStaticStrideArray{T, N, R, S, X, O, L} <: AbstractStrideArray{T, N, R, S, X, O}
    data::NTuple{L,T}
end
function freeze(sa::StaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
    @new FrozenStaticStrideArray{T, N, R, S, X, O, L} getfield(sa, 1)
end
function thaw(fsa::FrozenStaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
    @new StaticStrideArray{T, N, R, S, X, O, L} getfield(fsa, 1)
end

function Base.getindex(fsa::FrozenStaticStrideArray, i::Int...)
    thaw(fsa)[i...]
end
let
    sa = StaticStrideArray{Float64}(undef, (static(1000),)) .= 1
    fsa = freeze(sa)
    @btime $sa[1000]
    @btime $fsa[1000]
endl

#+RESULTS:
:   1.940 ns (0 allocations: 0 bytes)
:   1.939 ns (0 allocations: 0 bytes)

I think having an immutable variant like this could make this a much more convincing replacement for StaticArrays.jl.

MasonProtter avatar Apr 05 '23 20:04 MasonProtter

I am in favor, please feel free to make a PR!

@ChrisRackauckas and I have been talking for some time about making a StaticArrays.jl replacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems that StaticArrays.jl has.

I'm also in favor of making StrideArrays(Core) that library.

It should perhaps also be updated to make use of the new Pkg extensions.

chriselrod avatar Apr 05 '23 20:04 chriselrod

Hm, it seems that julia is being a bit overly cautious about avoiding us mutating a region of stack memory that belonged to an immutable variable, so a freeze/thaw is not as cheap an operation as one might hope. All the memory needs to be moved:

julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
           fv = freeze(v)
           @btime freeze($(thaw(fv)))
       end;
  153.136 ns (0 allocations: 0 bytes)

julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
           fv = freeze(v)
           code_llvm(Tuple{typeof(fv)}) do fv
               freeze(thaw(fv))
           end
       end;
;  @ REPL[8]:4 within `#3`
define void @"julia_#3_2881"([1 x [1000 x double]]* noalias nocapture sret([1 x [1000 x double]]) %0, [1 x [1000 x double]]* nocapture nonnull readonly align 8 dereferenceable(8000) %1) #0 {
top:
; ┌ @ REPL[5]:2 within `freeze`
   %.sroa.0.0..sroa_cast = bitcast [1 x [1000 x double]]* %1 to i8*
   %2 = bitcast [1 x [1000 x double]]* %0 to i8*
   call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(8000) %2, i8* noundef nonnull align 8 dereferenceable(8000) %.sroa.0.0..sroa_cast, i64 8000, i1 false)
; └
  ret void
}

Not a show-stopper, but having a cost to freezing and thawing does put a damper on things...

MasonProtter avatar Apr 06 '23 00:04 MasonProtter

Yes, we're aware of that problem. A few people looked at it and couldn't come up with simple fixes, but we think it's still worth it despite this cost (and eventually, compiler improvements should eliminate that copy).

chriselrod avatar Apr 06 '23 00:04 chriselrod

Okay, I'll see what I can do. What is your appetite here for a breaking version?

I'm picturing replacing what is currently StaticStrideArray with a StrideArray wrapping a Ref{<:NTuple}, i.e.

function StrideArray{T}(::UndefInitializer, sz::StaticInt...) where {T}
    r = Ref{NTuple{Int(prod(sz)), T}}
    p = Ptr{T}(pointer_from_objref(r))
    pA = PtrArray(p, sz)
    return StrideArray(pA, r)
end

and then the name StaticStrideArray would then be used for the immutable version. Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?

MasonProtter avatar Apr 06 '23 00:04 MasonProtter

I'm okay with a breaking version, as there's only a handful of packages I maintain depending on this, and none use StaticStrideArray, so it won't be too much work to update.

Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?

I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance. I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.

chriselrod avatar Apr 06 '23 01:04 chriselrod

I think I made an issue for this in ArrayInterface a while ago

Tokazama avatar Apr 08 '23 20:04 Tokazama

a StaticArrays.jl replacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems that StaticArrays.jl has

How would this be different from a SizedArray?

I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance. I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.

Sorry, I don't remember. Could have been Zulip.

cscherrer avatar Apr 10 '23 14:04 cscherrer

How would this be different from a SizedArray?

It wouldn't be backed by a Vector storage, but instead a type which can be stack allocated.

MasonProtter avatar Apr 10 '23 21:04 MasonProtter

Ah right, of course. Thanks @MasonProtter

cscherrer avatar Apr 10 '23 22:04 cscherrer