SuiteSparseGraphBLAS.jl
SuiteSparseGraphBLAS.jl copied to clipboard
Vector dot is much slower than build-in operation
I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow:
using SuiteSparseGraphBLAS
using BenchmarkTools
gbset(:nthreads, 16)
b = ones(10000)
b_gb = GBVector(b)
@btime b' * b # 1 μs
@btime b_gb' * b_gb # 15 μs
Is this expected? Or it can be tuned to be faster?
Version: [email protected]
I do see this behavior (although more like 10x on my device). The big thing is that SuiteSparse:GraphBLAS is not a replacement for BLAS1 operations. It's a sparse matrix library, so it will always be a bit slow for simple BLAS operations.
That being said we can probably do better here. Perhaps by unpacking and repacking the result and actually doing BLAS1. For the basic arithmetic semiring.
We could also not be at O3 for some reason, I'll check on that. As well as talk to Tim Davis.