LoopVectorization.jl
LoopVectorization.jl copied to clipboard
vmap gives incorrect results for iszero
If I run vmap with certain functions i get wrong results in some cases. I noticed this first for an integer array with iszero:
a = rand(Int8,200,20)
@test map(iszero,a) == vmap(iszero,a) #fails
Repeated evaluations of vmap(iszero,a) give different answers each time.
The same happens for some other functions I tested such as:
a = rand(200,20) #also for floats
vmap( ==(2.0), a)
vmap( >(0.5), a)
As a pattern these all return boolean arrays, functions that return floats seem to be unaffected.
vmapreduce is unaffected:
a = rand( Int8,200,20)
@test vmapreduce(iszero,+,a) == mapreduce(iszero,+,a) #passes
@test sum(map(iszero,a)) == sum(vmap(iszero,a)) #fails
The inplace version seems affected as well:
a = rand( Int8,200,20)
out = rand( Bool,200,20)
vmap!(iszero,out,a)
@test out == map(iszero,a) #fails
Interestingly this seems to consistently return the same (wrong) result instead of the always changing values of the out of place version. Here again this seems to only affect functions that return bools.
This is just from some playing around not something I actually need but unless I'm violating some assumptions LoopVectorization makes this seems like a bug.
Maybe related, b is not like vec(a):
julia> count(map(iszero, vec(a)))
7
julia> count(vmap(iszero, vec(a)))
1142
julia> b = rand(Int8,2000);
julia> count(vmap(iszero, b))
ERROR: MethodError: no method matching vstore!(::Ptr{Bool}, ::UInt16, ::Mask{32,UInt32})
I did some further tests on the inplace version and it appears that using a BitArray for the output gives correct results
out = BitArray(zeros(Bool,200,20))
vmap!(iszero,out,a)
@test out == map(iszero,a) #passes
Thanks for the report. The vmap code is actually fairly simple and isolated from the rest of LoopVectorization. Hopefully it's rather accessible for anyone who wants to take a look.
It seems the problem is in storing the results.
I did some further tests on the inplace version and it appears that using a BitArray for the output gives correct results
That's because
julia> @which vmap!(iszero,out,a)
vmap!(f, args...) in LoopVectorization at /home/chriselrod/.julia/dev/LoopVectorization/src/map.jl:252
it just calls regular map!. Down the line, after the ongoing VectorizationBase and ArrayInterface rework, I may add actual support for BitArrays.
vmapreduce is unaffected:
So long as you don't have more than 255 zeros:
julia> vmapreduce(iszero,+,a)
0x15
julia> mapreduce(iszero,+,a) #passes
21
julia> Int(vmapreduce(iszero,+,a))
21
Maybe it'd be better to have a minimum accumulator size?
Maybe related, b is not like vec(a):
I get that error no matter the vmap example here.
Maybe related, b is not like vec(a):
That's because
julia> 200*20 % 32 # 200 * 20 = length(a)
0
julia> 2000 % 32 # 2000 = length(b)
16
With a remainder of 0, it wasn't running the undefined store code.
But that is still a clue:
ERROR: MethodError: no method matching vstore!(::Ptr{Bool}, ::UInt16, ::Mask{32,UInt32})
Why would we have ::UInt16 and a ::Mask{32}?
These numbers are arch dependent, so mine will look a little different...
julia> VectorizationBase.mask(Bool, 33) # returns a mask for the remainder
Mask{64,Bool}<1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
julia> W = VectorizationBase.pick_vector_width(Bool)
64
julia> V = VectorizationBase.pick_vector_width_val(Bool)
Val{16}()
Oops, it looks like pick_vector_width and pick_vector_width_val disagree for some reason.
But there's another problem that storing into Boolean vectors wasn't really supported, and the vstore! methods are still missing.
I'll address this in that VectorizationBase rewrite, but it may still be a couple weeks.
Until then, I'll make the vmap! methods just use map! for regular Boolean vectors, so that they at least get the correct answer.
I'll close the issue when the "actual" vmap starts applying to when we have Booleans.
I think it should also support BitArrays, and default to them when we have boolean output.