LoopVectorization.jl
LoopVectorization.jl copied to clipboard
@turbo can't find index when looping over sparse arrays
I am trying to use turbo in this function to multiply A'x when A is sparse
function my_mul_lv!(y::Vector{Float64}, A::SparseMatrixCSC{Float64,Int64}, x::Vector{Float64})
@turbo for i in eachindex(y)
tmp = 0.0
for j in nzrange(A, i)
tmp += A.nzval[j] * x[A.rowval[j]]
end
y[i] = tmp
end
return
end
The function works without the @turbo macro but when I add it, it throws this an error that i isn't defined: UndefVarError: i not defined
To provide further details if I define
function my_mul_lv!(y::Vector{Float64}, A::SparseMatrixCSC{Float64,Int64}, x::Vector{Float64})
@turbo for i in eachindex(y) # for each branch
tmp = 0.0
nzrange = nzrange(A, i) # non zero bus indices
for j in nzrange
tmp += A.nzval[j] * x[A.rowval[j]]
end
y[i] = tmp
end
return
end
I get the error
ERROR: LoadError: ArgumentError: `@turbo` only supports loops iterating over ranges, and not objects of type `typeof(nzrange)`.
However,
julia> typeof(nzrange(A, 1))
UnitRange{Int64}
It looks like sparse arrays are currently not supported. This is not surprising, as they lead to irregular memory access.
However, once the random access loads have been performed, the computation (in this case, a multiplication) can still be performed using SIMD instructions.
It would be great to have a special macro for loops over columns of a sparse matrix , that can parallelize those parts that can be. And also send each column to a different thread, if using @tturbo.
This is something I'd like to work on eventually in LoopModels, but it'd be a long time. LoopModels is currently very coupled with affine representations of loop nests and indices.
You may be interested in https://github.com/willow-ahrens/Finch.jl