StaticArrays.jl
StaticArrays.jl copied to clipboard
circshift(SArray) returns MArray
Example:
julia> sv = SVector(1,2,3);
julia> circshift(sv, 1)
3-element MArray{Tuple{3},Int64,1,3} with indices SOneTo(3):
3
1
2
I think the result should be SArray.
Agreed. The reason for this is that it's dispatching to Base and similar(::SVector) returns an MVector.
I looked into this as I wanted to learn a bit more about Julia and this library, but, oh boy, this was (to a novice like me) a surprisingly difficult problem.
The immutability basically forces a much more inconvenient implementation (with significant overhead), just like setindex.
So, unless one goes for an approach with a mutable temporary, you'd have to end up with something along the lines of;
circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
return quote
@inbounds return typeof(a)(tuple($(exprs...)))
end
end
So, given the forced marriage of stack allocation and immutability in Julia, I have started to think of this more philosophically. Basically, the need for this a circshift (or a setindex) with a dynamic shift variable indicates an underlying problem.
A "compile time" length for the shift is what we actually should want most of the times with static variables (I apologize for the messy code here; i'm new to generated functions and barely know what I'm doing)
function circshift(a::StaticArray{Tuple{L}}, ::Length{N}) where {L, N}
vcat(_slice(Length(L-N+1), Length(L), a), _slice(Length(1), Length(L-N), a))
end
@generated function _slice(::Length{L}, ::Length{U}, a::StaticArray{<:Tuple,T}) where {L, U, T}
exprs = [:(a[$i]) for i = L:U]
return quote
@inbounds return SArray{Tuple{1+U-L},T,1,1+U-L}(tuple($(exprs...)))
end
end
which, of course, is super-fast if one specifies a static shift circshift(a, Length(3))
I looked into this as I wanted to learn a bit more about Julia and this library, but, oh boy, this was (to a novice like me) a surprisingly difficult problem.
Haha! Well thanks for jumping right in I think you're on the right track here :-)
So, unless one goes for an approach with a mutable temporary, you'd have to end up with something along the lines of [...]
Yes I think that's basically the right way to implement it. However it might not be so bad as you think: the compiler is quite good at constant propagation, so in the cases where shift is constant in the caller, the code for all the occurrences of mod in _circshift will be optimized away. However for this to work you may need to encourage some extra inlining:
@inline Base.circshift(a::StaticArray, shift::Int) = _circshift(Length(a), a, shift)
@generated function _circshift(::Length{L}, a::StaticArray{<:Tuple,T}, shift::Int) where {L, T}
exprs = [:(a[1+mod($i-shift, L)]) for i = 0:(L-1)]
return quote
$(Expr(:meta, :inline))
@inbounds return typeof(a)(tuple($(exprs...)))
end
end
Note the appearance of Expr(:meta, :inline) in the generated code, which is the confusing way you need to write @inline for @generated functions. It seemed I also needed to add @inline to circshift.
Thus we get extremely simple native code when the shift amount can constant propagated all the way inward. Eg, for a function foo:
julia> foo(a) = circshift(a, length(a)÷2)
foo (generic function with 1 method)
julia> foo(SA[1,2,3,4,5])
5-element SArray{Tuple{5},Int64,1,5} with indices SOneTo(5):
4
5
1
2
3
julia> @code_native foo(SA[1,2,3,4,5])
.text
; ┌ @ REPL[11]:1 within `foo'
movq %rdi, %rax
; │┌ @ REPL[9]:1 within `circshift'
; ││┌ @ REPL[3]:2 within `_circshift'
; │││┌ @ REPL[3]:5 within `macro expansion'
vmovups 24(%rsi), %xmm0
vmovups (%rsi), %xmm1
movq 16(%rsi), %rcx
; │└└└
vmovups %xmm0, (%rdi)
vmovups %xmm1, 16(%rdi)
movq %rcx, 32(%rdi)
retq
nop
; └
So, given the forced marriage of stack allocation and immutability in Julia
Luckily this isn't quite true :-) The compiler will commonly allocateMVector on the stack as well, but only if it can prove that it's a temporary which doesn't escape the stack frame. So one viable alternative to the generated function might be to use a temporary MVector, but re-wrap in the type of the input vector before returning it. This will convert to SVector if an SVector was passed in.
Yeah I didn't manage to get it to inline properly, so i saw very poor performance.
Though, the tricky case is that performance is somewhat poor (as arrays get a bit larger) when the calls to mod can't optimized away
Experimenting a bit
@inline function circshift(v::SVector{L}, shift::Integer) where L
out = similar(v)
shift = mod(shift, L)
cut = L-shift
@inbounds for i in 1:(L-cut)
out[i] = v[i+cut]
end
@inbounds for i in 1:cut
out[i+shift] = v[i]
end
return SVector(out)
end
So, this version seems to be sufficiently optimizer friendly, with the added benefit of not having to mess about with generated functions (a plus in I.M.H.O.) and the fact that it's fast in cases where shift isn't known at compile time.
not having to mess about with generated functions (a plus in I.M.H.O.)
Yes it's always nice to avoid @generated!
function circshift(v::SVector{L}, shift::Integer) where L
You're not using L in dispatch here, so you can simplify the signature and just use length internally which should cleaner and can easily be optimized as it can always be known from typeof(v):
@inline function circshift(v::StaticVector, shift::Integer)
w = similar(v)
L = length(v)
shift = mod(shift, L)
cut = L-shift
@inbounds for i in 1:(L-cut)
w[i] = v[i+cut]
end
@inbounds for i in 1:cut
w[i+shift] = v[i]
end
return similar_type(v)(w)
end
Really nice - so that does the end up with good code these days? Do we end up with a bunch of extra move instructions on the last line or is it ellided entirely?
What happens for big numbers and other mutable things that might not work well inside an MArray?
@c42f seeing this I feel like we are edging closer to making this package simpler or even redundant - I mean that function is basically good for AbstractArray! I'd simply replace the last line with return freeze(w) (a la Keno's suggestion). To fix MArray we could really use a "mutable tuple" which freezes to a Tuple... and then maybe constant propagation is the only "magic" we need here.
What happens for big numbers and other mutable things that might not work well inside an MArray?
There's a WIP PR to Base (https://github.com/JuliaLang/julia/pull/34126) which puts pointers inline, so that could maybe help fix MArray with BigFloat etc (though we'd still need extra care to preserve GC invariants). In fact that suggests #749 might be reasonable to merge.
seeing this I feel like we are edging closer to making this package simpler or even redundant - I mean that function is basically good for
AbstractArray! I'd simply replace the last line withreturn freeze(w)(a la Keno's suggestion).
Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.
There's a WIP PR to Base (JuliaLang/julia#34126) which puts pointers inline, so that could maybe help fix MArray with BigFloat etc (though we'd still need extra care to preserve GC invariants).
Unfortunately this doesn't affect all objects (structs with union types and unallocated fields may be treated differently).
Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.
Yeah! Let's do it :) I'd still be tempted to create a seperate package with freeze and thaw and overload that, so everyone can explore.
I'd still be tempted to create a seperate package with
freezeandthawand overload that, so everyone can explore.
I'm thinking to do this for a while for structs and tuples. It's super easy to implement this with Setfield.jl: https://github.com/jw3126/Setfield.jl/issues/56#issuecomment-516696745
Yes agreed things are getting simpler! I think the larger way forward is to start using freeze (or some freeze-like thing) internally in StaticArrays and see whether we can get good performance plus code which would in principle work for AbstractArray. Then try to merge some of that back into Base once we know things work.
Yeah! Let's do it :) I'd still be tempted to create a seperate package with freeze and thaw and overload that, so everyone can explore.
So... I got overexcited. I think I need this for Dictionaries.jl anyways.
https://github.com/andyferris/Freeze.jl
You beat me to it! I hope I'm not slowing down your excitement but I think issettable is too coarse-grained. It might be better to be able to freeze/thaw different entities (e.g., shape-frozen vs value-frozen). For example, you'd want shape-frozen value-mutable arrays for columns of tables. Also, I've been thinking that it'd be nice to have an ownership-as-convention kind of API. This way, you can get a from thaw(move(freeze(a))) "safely."
Haha!
Yes we definitely need “shape frozen” vs “value frozen” semantics and I meant that issettable is only for mutating values not the shape. My experience with Dictionaries.jl is containers whose keys change are significantly more challenging to design shared generic code for, and also all static arrays are shape frozen anyway.
If you are going to think about these things remember that append-only datasets are a thing and that we might want eg a vector that supports push! but not setindex!. Thinking if it this way we can keep the two concepts simpler and orthogonal rather than complecting them. (That’s also why setindex! doesn’t insert new keys in Dictionaries.jl).
For dictionaries I was thinking isinsertable, freezekeys and thawkeys could be the (orthogonal) “shape” interface. What do you think?
Note that it is possible to write something like
function matmul(a::SMatrix{I, J}, b::SMatrix{J, K}) where {I, J, K}
c = zero(MMatrix{I, K})
@inbounds for k in 1:K, j in 1:J
b_jk = b[j, k]
@simd for i in 1:I
c[i, k] = muladd(a[i, j], b_jk, c[i, k])
end
end
return SMatrix(c)
end
now, without causing allocations. It seems LLVM only unrolls the innermost loop now for some reason though.
(I opened https://github.com/andyferris/Freeze.jl/issues/1 to avoid derailing this issue too much.)
So I started experimenting further: https://github.com/andyferris/StaticArraysLite.jl
So far the basic functionality is quite competitive to StaticArrays for SVector{3, Float64}. Would be very interesting to see how this goes for matrices and slightly larger arrays.
I need to use circshift() on SVector, and I discovered someone submitted this issue a while ago. It turned out that it was actually myself 😁.
Back then I just reported the lack of this functionality and didn't think about the solution. Base.circshift() supports multi-dimensional arrays, but if we focus on vectors, would it be possible to add the following simple solutions?
Base.circshift(v::SVector{N,T}, shift::Integer) where {N,T} = SVector(ntuple(k->v[mod1(k-shift,N)], Val(N)))
Base.circshift(v::SVector{N,T}, ::Val{S}) where {N,T,S} = SVector(ntuple(k->v[mod1(k-S,N)], Val(N)))
Here are the benchmark results:
julia> VERSION
v"1.7.2-pre.0"
julia> v = SVector(ntuple(identity, 10))
10-element SVector{10, Int64} with indices SOneTo(10):
1
2
3
4
5
6
7
8
9
10
julia> @btime circshift($v, 3);
31.051 ns (0 allocations: 0 bytes)
julia> @btime circshift($v, Val(3));
2.641 ns (0 allocations: 0 bytes)