tutorials
tutorials copied to clipboard
2:4 Sparsity acceleration does not deliver any benefit.
When checking out the conclusion of the tutorial for 2:4 sparsity here, the claimed advantage of 2:4 sparsity over dense execution is given as 1.3x-2.0x. However, when checking the actual values that are output in the dense and sparse section terminal sections we get the following table:
| bs | compile | Dense | Sparse | Speedup |
|---|---|---|---|---|
| 4 | n | 9.56 | 16.77 | 0.57x |
| 4 | y | 8.98 | 9.49 | 0.95x |
| 16 | n | 31.86 | 62.27 | 0.51x |
| 16 | y | 30.83 | 34.29 | 0.90x |
| 64 | n | 123.97 | 243.16 | 0.51x |
| 64 | y | 104.98 | 133.49 | 0.79x |
| 256 | n | 476.03 | 1195.23 | 0.40x |
| 256 | y | 397.13 | 542.3 | 0.73x |
As can be seen, the sparse matrix computation does not beat the dense one even once. I rerun these experiments with torch 2.5.1+cu2.4 on a single H100 and observed similar results.
How come the values are this much worse?