LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Assignment vectorization
Now if I run the following I get an error:
a = collect(1:100)
items = rand(20)
indices = collect(1:20)
@avx a[indices] .= items
ERROR: MethodError: no method matching vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Int64,1}}}, ::Val{:Main})Closest candidates are:
vmaterialize!(::Union{LinearAlgebra.Adjoint{T,A}, LinearAlgebra.Transpose{T,A}}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, A<:AbstractArray{T,N}, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:218
vmaterialize!(::AbstractArray{T,N}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:194
That expression is equal to the following that works
indiceslen = length(indices)
@avx for k = 1:indiceslen
a[indices[k]] = items[k]
end
For some reason I had restricted vmaterialize! to only accept T<: Union{Float32,Float64}.
Locally (on master), I've now extended it to T <: Union{Int32,Int64,Float32,Float64}.
I'll add tests with integers before pushing to master.
However, I now get a different error:
julia> @avx a[indices] .= items
ERROR: conversion to pointer not defined for SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] unsafe_convert(::Type{Ptr{Int64}}, ::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./pointer.jl:67
[3] pointer(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./abstractarray.jl:933
[4] stridedpointer at /home/c285497/.julia/dev/VectorizationBase/src/vectorizable.jl:438 [inlined]
[5] macro expansion at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194 [inlined]
[6] vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Float64,1}}}, ::Val{:Main}) at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194
[7] top-level scope at REPL[5]:1
If indices were 1:20, this should work (ie, conversion to pointer is defined for those subarrays).
However, collect(1:20) is much more difficult, because the elements aren't necessarily going to hold any particular pattern.
It would be good to add support, so I'll leave this issue open. Support will be added in VectorizationBase and SIMDPirates.
- [ ] VectorizationBase defines
stridedpointer, which is a function we can overload for special behavior on different array types. It already has two different special overloads forSubArrays, based on whether the first index is anIntor aUnitRange(SubArrays whose first index is anInt's elements will be discontiguous, so we treat them differently). I'll have to add another method and type to handle subarrays indexed by arrays. - [ ] SIMDPirates will have to add a few SIMD methods for that type.
- [x] SIMDPirates also needs a few changes to make the loop you showed fast when
indicesis a unitrange. Currently, loading from unitranges producesVecs, which forces your stores to be gathers. I'll make loading from them produce_MMs, which "know" that the elements are sequential, so stores/loads can be fast.
Note that the loop is likely to give you wrong answers (or do weird things) if indices contains repeats of the same number.
Eg, if
indices = [1,2,3,1]
a[indices] = 10:13
What should a equal if we're assigning 10:13 in parallel above?
how is it supposed to store into the first element twice in parallel (eg, with AVX2)?
Some architectures have "conflict detection" instructions that can be used to look for this.
You (or anyone else) can file an issue if you want some sort of special handling around that.
It may be best to just call it "undefined behavior". Doing it sequentially as in the above code of course yields, a[1] == 13.