pytorch_sparse
pytorch_sparse copied to clipboard
Slow sparse-sparse multiplication with SparseTensor format
Hi, I find it when doing sparse-sparse multiplication with SparseTensor format, the speed is super slow even compared with doing this with dense format.
The variable s and adj_t are both SparseTensor, and below is the info:

Then I just do 10000 times sparse-sparse multiplication (to avoid noise) and got ~110s time usage:

The same computation with a simple .to_dense() making time usage decease to ~5s:

Move the to_dense() out of for-loop, it becomes even faster (~3s):

I'm confusing because SparseTensor should be faster then dense format for such a very sparse tensor, this bug make my program really slow🥲, hope you can fix it
(All the above experiments are on an empty 3090 GPU card(without other program's disturbing).
Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:
s_t @ (adj_t @ s).to_dense()
We are utilizing cusparse internally for this, so there is sadly not much we can do to improve its efficiency :(
Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g.,
adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:s_t @ (adj_t @ s).to_dense()We are utilizing
cusparseinternally for this, so there is sadly not much we can do to improve its efficiency :(
It does becomes faster however still cost ~50s, and the result of adj_t @ s is also sparse (density=0.67%). Actually I found when doing SparseTensor@dense, it is really fast, and faster then SparseTensor.to_dense()@dense, I also tried to do the operations with torch.sparse, it is also slow and cost ~25s, this is wired since SparseTensor and torch.sparse are both designed for sparse tensors.
I want to ask is there a way to do sparse@sparse efficiently? I can't believe both SparseTensor@SparseTensor and [email protected] are so slow even compared with dense format🤪

How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.
How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.
The final output is also sparse, with density=0.68%

This is not really a GPU-friendly algorithm
I also tried to do ot on CPU, and reduce time from ~110 to ~25:
If the CPU time is much faster than GPU times, I assume that your sparse matrices are too small to take advantages of the GPU here.
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?
This should be resolved.