LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Custom A*X .+ b returning wrong results
First of all, great package! I was trying to implement a slight alteration to the matrix multiplication example from the Readme.md
:
function custom_gemm!(C::Matrix{T}, A::Matrix{T}, B::Matrix{T}, b::Vector{T}) where { T }
@turbo for m ∈ axes(A,1), n ∈ axes(B,2)
Cmn = b[m]
for k ∈ axes(A,2)
Cmn += A[m,k] * B[k,n]
end
C[m,n] = Cmn
end
end
where I only changed the original line Cmn = zero(eltype(C))
to Cmn = b[m]
. Basically I want this function to return A*X .+ b
. Although this change seems trivial in my eyes, the implementation gives:
julia> using LoopVectorization
julia> Y, A, X, b = randn(2,2), randn(2,2), randn(2,2), ones(2)
julia> A*X .+ b
2×2 Matrix{Float64}:
1.44203 0.534377
0.931009 1.14601
julia> custom_gemm!(Y, A, X, b)
2×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
It seems that the inner for
-loop is ignored and Cmn
is only set to b[m]
. If I slightly change the above function the code works again:
function custom_gemm2!(C::Matrix{T}, A::Matrix{T}, B::Matrix{T}, b::Vector{T}) where { T }
@turbo for m ∈ axes(A,1), n ∈ axes(B,2)
Cmn = zero(T)
Cmn += b[m]
for k ∈ axes(A,2)
Cmn += A[m,k] * B[k,n]
end
C[m,n] = Cmn
end
end
julia> custom_gemm2!(Y, A, X, b)
2×2 Matrix{Float64}:
1.44203 0.534377
0.931009 1.14601
I am not sure what is causing this issue, but I can imagine that a lot of people might accidentally run into this.
julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 12
LoopVectorization v0.12.101
I am not sure what is causing this issue
Basically, LoopVectorization parses the expression in a single pass. If you write
for m ∈ axes(A,1), n ∈ axes(B,2)
Cmn = zero(T)
then it records that Cmn
depends on m
and n
.
If you write
for m ∈ axes(A,1), n ∈ axes(B,2)
Cmn = b[m]
then it records that Cmn
depends on m
, because it doesn't know yet you're going to sum over it.
In hindsight, single-pass parsing that also deduces the structure/behavior of the loop was not a good idea. It'd be better to first build an analyzable internal representation, and then use this to derive meaning.
Long term, I am rewriting LoopVectorization, which will address that issue (among others).
Short term, the simplest fix would probably be to add an extra field to the Operation
struct that indicates the loop level it was initialized at, that the reduction code can later use. Should be a simple fix.