LoopVectorization.jl icon indicating copy to clipboard operation
LoopVectorization.jl copied to clipboard

Assignment vectorization

Open aminya opened this issue 5 years ago • 1 comments
trafficstars

Now if I run the following I get an error:

a = collect(1:100)
items = rand(20)
indices = collect(1:20)
@avx a[indices] .= items
ERROR: MethodError: no method matching vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Int64,1}}}, ::Val{:Main})Closest candidates are:
  vmaterialize!(::Union{LinearAlgebra.Adjoint{T,A}, LinearAlgebra.Transpose{T,A}}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, A<:AbstractArray{T,N}, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:218
  vmaterialize!(::AbstractArray{T,N}, ::BC, ::Val{Mod}) where {T<:Union{Float32, Float64}, N, BC<:Base.Broadcast.Broadcasted, Mod} at C:\Users\yahyaaba\.julia\packages\LoopVectorization\iNfCA\src\broadcast.jl:194

That expression is equal to the following that works

indiceslen = length(indices)
@avx for k = 1:indiceslen
        a[indices[k]] = items[k]
    end

aminya avatar Feb 04 '20 07:02 aminya

For some reason I had restricted vmaterialize! to only accept T<: Union{Float32,Float64}. Locally (on master), I've now extended it to T <: Union{Int32,Int64,Float32,Float64}.

I'll add tests with integers before pushing to master.

However, I now get a different error:

julia> @avx a[indices] .= items
ERROR: conversion to pointer not defined for SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] unsafe_convert(::Type{Ptr{Int64}}, ::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./pointer.jl:67
 [3] pointer(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}) at ./abstractarray.jl:933
 [4] stridedpointer at /home/c285497/.julia/dev/VectorizationBase/src/vectorizable.jl:438 [inlined]
 [5] macro expansion at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194 [inlined]
 [6] vmaterialize!(::SubArray{Int64,1,Array{Int64,1},Tuple{Array{Int64,1}},false}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Float64,1}}}, ::Val{:Main}) at /home/c285497/.julia/dev/LoopVectorization/src/broadcast.jl:194
 [7] top-level scope at REPL[5]:1

If indices were 1:20, this should work (ie, conversion to pointer is defined for those subarrays). However, collect(1:20) is much more difficult, because the elements aren't necessarily going to hold any particular pattern.

It would be good to add support, so I'll leave this issue open. Support will be added in VectorizationBase and SIMDPirates.

  • [ ] VectorizationBase defines stridedpointer, which is a function we can overload for special behavior on different array types. It already has two different special overloads for SubArrays, based on whether the first index is an Int or a UnitRange (SubArrays whose first index is an Int's elements will be discontiguous, so we treat them differently). I'll have to add another method and type to handle subarrays indexed by arrays.
  • [ ] SIMDPirates will have to add a few SIMD methods for that type.
  • [x] SIMDPirates also needs a few changes to make the loop you showed fast when indices is a unitrange. Currently, loading from unitranges produces Vecs, which forces your stores to be gathers. I'll make loading from them produce _MMs, which "know" that the elements are sequential, so stores/loads can be fast.

Note that the loop is likely to give you wrong answers (or do weird things) if indices contains repeats of the same number. Eg, if

indices = [1,2,3,1]
a[indices] = 10:13

What should a equal if we're assigning 10:13 in parallel above? how is it supposed to store into the first element twice in parallel (eg, with AVX2)? Some architectures have "conflict detection" instructions that can be used to look for this. You (or anyone else) can file an issue if you want some sort of special handling around that. It may be best to just call it "undefined behavior". Doing it sequentially as in the above code of course yields, a[1] == 13.

chriselrod avatar Feb 04 '20 14:02 chriselrod