SuiteSparseGraphBLAS.jl Vector dot is much slower than build-in operation

Vector dot is much slower than build-in operation

Open learning-chip opened this issue 2 years ago • 1 comments

I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow:

using SuiteSparseGraphBLAS
using BenchmarkTools

gbset(:nthreads, 16)

b = ones(10000)
b_gb = GBVector(b)

@btime b' * b  #  1 μs
@btime b_gb' * b_gb  # 15 μs

Is this expected? Or it can be tuned to be faster?

Version: [email protected]

Apr 20 '22 10:04 learning-chip

I do see this behavior (although more like 10x on my device). The big thing is that SuiteSparse:GraphBLAS is not a replacement for BLAS1 operations. It's a sparse matrix library, so it will always be a bit slow for simple BLAS operations.

That being said we can probably do better here. Perhaps by unpacking and repacking the result and actually doing BLAS1. For the basic arithmetic semiring.

We could also not be at O3 for some reason, I'll check on that. As well as talk to Tim Davis.

Apr 20 '22 10:04 rayegun

SuiteSparseGraphBLAS.jl SuiteSparseGraphBLAS.jl copied to clipboard

Vector dot is much slower than build-in operation

SuiteSparseGraphBLAS.jl
SuiteSparseGraphBLAS.jl copied to clipboard