StrideArrays.jl
StrideArrays.jl copied to clipboard
Size constraints?
Hi @chriselrod , thanks for making this available, and for your nice summary here: https://github.com/cscherrer/MeasureTheory.jl/discussions/49#discussioncomment-303700
With StaticArrays, there's compiler overhead that grows with the size of the array. From some command line checks this doesn't seem to be the case for StrideArrays. Is that right? Would it be accurate that StrideArray performance will always be at least as good as vanilla Arrays, as long as you're careful about the gc concerns?
A major contributor to StaticArray's overhead is that NTuples aren't actually treated specially in Julia, which should eventually be fixed: https://github.com/JuliaLang/julia/pull/31681
From some command line checks this doesn't seem to be the case for
StrideArrays. Is that right?
It shouldn't. StrideArrays memory is often backed by a NTuple, but it avoids interacting with it.
MArrays also tend to do a lot better than SArrays for that reason.
Would it be accurate that StrideArray performance will always be at least as good as vanilla Arrays, as long as you're careful about the gc concerns?
It shouldn't degrade with size. But the stack-allocation trick has it's limits -- namely, the stack size.
On 64-bit Julia builds, Octavian.jl's matrix multiplication (C = A * B) uses this trick to allocate memory to copy blocks of A into.
However, tests were failing on 32 bit builds because of stack overflow errors.
Not caused by a function accidentally recursively calling itself until you run out of stack, but because these allocated matrices were accidentally using it all!
It was creating arrays of around 1 MiB in size (or 131_072 Float64).
The stack is much smaller than the amount of RAM. Apparently, it is also smaller in 32-bit builds of Julia.
So for those it does a simpler version of what ProbabilityModels did: allocates a chunk of memory to use when it needs it. Stacks are extremely simple, which is also why they're so fast. Allocating and freeing memory is just changing the value of an integer. Simpler than ProbabilityModels, because Octavian is only creating one of these arrays at a time per thread.
But it's nicer to let LLVM handle incrementing the pointer for you.
As for performance, the goal is to be faster, in part thanks to providing more efficient methods, e.g.:
julia> using StrideArrays, StaticArrays, BenchmarkTools
julia> A = @MMatrix rand(8,8); B = @MMatrix rand(8,8); C = similar(A);
julia> @benchmark mul!($C, $A, $B)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 39.008 ns (0.00% GC)
median time: 39.088 ns (0.00% GC)
mean time: 40.785 ns (0.00% GC)
maximum time: 82.652 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 991
julia> @benchmark mul!($(StrideArray(C)), $(StrideArray(A)), $(StrideArray(B)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 14.254 ns (0.00% GC)
median time: 15.284 ns (0.00% GC)
mean time: 15.255 ns (0.00% GC)
maximum time: 37.802 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
julia> Abase = Array(A);
julia> @btime sum($Abase)
17.074 ns (0 allocations: 0 bytes)
32.746376987098714
julia> @btime sum($A)
19.874 ns (0 allocations: 0 bytes)
32.746376987098714
julia> @btime sum($(StrideArray(A)))
3.284 ns (0 allocations: 0 bytes)
32.746376987098714
julia> @btime mapreduce(log, +, $Abase)
256.875 ns (0 allocations: 0 bytes)
-60.07821360899319
julia> @btime mapreduce(log, +, $A)
244.497 ns (0 allocations: 0 bytes)
-60.07821360899319
julia> @btime mapreduce(log, +, $(StrideArray(A)))
45.248 ns (0 allocations: 0 bytes)
-60.07821360899318
julia> @btime mapreduce(log, +, $(StrideArray(Abase)))
50.536 ns (0 allocations: 0 bytes)
-60.07821360899318
julia> Float64(mapreduce(log ∘ big, +, Abase)) # just checking which version rounded correctly ;)
-60.07821360899318
Of course, there are probably bugs and features missing, including functions that might not appropriately add the @gcpreserve to allow for stack allocation, so issues are welcome if/when you run into any problems!