SparseArrays.jl
SparseArrays.jl copied to clipboard
dot(::SparseMatrixCSC, ::Array) should be specialized
using LinearAlgebra, SparseArrays
using BenchmarkTools
julia> M = rand(500, 300);
julia> S = spzeros(size(M)...);
julia> @btime dot($M, $S);
521.241 μs (0 allocations: 0 bytes)
In comparison, even materializing S as dense is better time-wise at this scale:
julia> @btime dot($S, $S);
236.493 ns (0 allocations: 0 bytes)
julia> @btime dot($M, $M);
42.695 μs (0 allocations: 0 bytes)
julia> @btime dot(collect($S), $M);
248.822 μs (2 allocations: 1.14 MiB)
The specialized version should iterate only on the nonzeros of the sparse matrix
Implementation proposal. This must be tweaked back to get the recursive version of dot, but it won't change dramatically:
fast_dot(x, y) = dot(x, y)
function fast_dot(A::Matrix{T1}, B::SparseArrays.SparseMatrixCSC{T2}) where {T1, T2}
T = promote_type(T1, T2)
(m, n) = size(A)
if (m, n) != size(B)
throw(DimensionMismatch("Size mismatch"))
end
s = zero(T)
if m * n == 0
return s
end
rows = SparseArrays.rowvals(B)
vals = SparseArrays.nonzeros(B)
for j in 1:n
for ridx in SparseArrays.nzrange(B, j)
i = rows[ridx]
v = vals[ridx]
s += v * A[i,j]
end
end
return s
end
julia> @btime fast_dot($M, $S);
389.750 ns (0 allocations: 0 bytes)
# the default version calling dot
julia> @btime fast_dot($S, $M);
632.688 μs (0 allocations: 0 bytes)
Make a PR?
Yes that's on the way, I just wanted to be sure I hadn't missed ongoing work on that (didn't have master built on my machine)
@dkarrasch may know. I am not aware of it - but I might have missed something.
There will be another PR to add in the same spirit on structurally sparse matrices from LinearAlgebra: https://github.com/JuliaLang/julia/pull/39889#issuecomment-789284869
Kicking the can of worms after seeing related issues: https://github.com/JuliaLang/julia/issues/31330
I think nonzeroinds should be defined either in Base or in LinearAlgebra and re-exported by SparseArrays so that structurally sparse types (Diagonal, ...) can implement it. This can lead to generic ways to write sparse matrix multiplication, dot products and others
this could even be used to make the generic_dot implementation faster for all structurally sparse arrays