pytorch_sparse icon indicating copy to clipboard operation
pytorch_sparse copied to clipboard

Slow sparse-sparse multiplication with SparseTensor format

Open realCrush opened this issue 3 years ago • 7 comments
trafficstars

Hi, I find it when doing sparse-sparse multiplication with SparseTensor format, the speed is super slow even compared with doing this with dense format.

The variable s and adj_t are both SparseTensor, and below is the info: image

Then I just do 10000 times sparse-sparse multiplication (to avoid noise) and got ~110s time usage: image

The same computation with a simple .to_dense() making time usage decease to ~5s: image

Move the to_dense() out of for-loop, it becomes even faster (~3s): image

I'm confusing because SparseTensor should be faster then dense format for such a very sparse tensor, this bug make my program really slow🥲, hope you can fix it

(All the above experiments are on an empty 3090 GPU card(without other program's disturbing).

realCrush avatar Aug 27 '22 09:08 realCrush

Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:

s_t @ (adj_t @ s).to_dense()

We are utilizing cusparse internally for this, so there is sadly not much we can do to improve its efficiency :(

rusty1s avatar Aug 29 '22 08:08 rusty1s

Thanks for reporting! What is the timing when only running a single sparse-sparse matrix multiplication (e.g., adj_t @ s)? My assumption is that the intermediate sparse matrix is pretty dense, which might explain the overall slowdown. You can also try to densify the intermediate representation:

s_t @ (adj_t @ s).to_dense()

We are utilizing cusparse internally for this, so there is sadly not much we can do to improve its efficiency :(

It does becomes faster however still cost ~50s, and the result of adj_t @ s is also sparse (density=0.67%). Actually I found when doing SparseTensor@dense, it is really fast, and faster then SparseTensor.to_dense()@dense, I also tried to do the operations with torch.sparse, it is also slow and cost ~25s, this is wired since SparseTensor and torch.sparse are both designed for sparse tensors.

I want to ask is there a way to do sparse@sparse efficiently? I can't believe both SparseTensor@SparseTensor and [email protected] are so slow even compared with dense format🤪

image

realCrush avatar Aug 29 '22 08:08 realCrush

I leave two variables as .pkl here, you can try it:

test.zip

realCrush avatar Aug 29 '22 08:08 realCrush

How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.

rusty1s avatar Aug 29 '22 09:08 rusty1s

How dense is the final output? I think doing sparse-sparse matrix multiplication on GPUs efficiently is still an open problem. This is not really a GPU-friendly algorithm, as it involves non-contiguous access patterns and requires computing the number of NNZ entries in advance.

The final output is also sparse, with density=0.68%

image

realCrush avatar Aug 29 '22 09:08 realCrush

This is not really a GPU-friendly algorithm

I also tried to do ot on CPU, and reduce time from ~110 to ~25:

image

realCrush avatar Aug 29 '22 09:08 realCrush

If the CPU time is much faster than GPU times, I assume that your sparse matrices are too small to take advantages of the GPU here.

rusty1s avatar Aug 29 '22 11:08 rusty1s

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

github-actions[bot] avatar Feb 26 '23 01:02 github-actions[bot]

This should be resolved.

rusty1s avatar Feb 27 '23 06:02 rusty1s