rocSOLVER
rocSOLVER copied to clipboard
(WIP) sytrd use gemv instead of symv
Code modifications to sytrd() and latrd() to use gemv() (general matrix vector multiply) instead of symv() (symmetric matrix vector multiply).
In some implementations and problem sizes, gemv() may give higher performance compared to symv(), even though symv() should perform only half the work and touch about half the data. The implementation of symv() might also be using atomic update operations.
The changes include:
- allocating more work storage in sytrd() to store the conceptually untouched strictly upper triangular or strictly lower triangular part.
- invoke kernels in sytrd() to save (on entry) and restore (on exit) the triangular parts.
- invoke kernels in latrd() (on entry) to copy the strictly lower triangular part or strictly upper triangular part of matrix to enforce symmetry. This is to allow gemv() to replace calls to symv().
- modified xxTRD_BLOCKSIZE from 32 to 64 to reduce the cost for matrix copy for enforcing matrix symmetry in latrd()
On gfx1030, using rocsolver-bench -f sytrd --precision s --iters 5
| n | using gemv (us) | using symv (us) |
|---|---|---|
| 1024 | 62,201 | 73,580 |
| 2048 | 137,507 | 190,302 |
| 4096 | 335,996 | 522,818 |
| 8192 | 2,161,237 | 1,925,926 |
On MI300 (splinter-126-wr-d3, gfx942)
| n | using gemv (us) | using symv (us) |
|---|---|---|
| 1024 | 40,310 | 53,154 |
| 2048 | 94,027 | 157,111 |
| 4096 | 237,926 | 484,525 |
| 8192 | 689,551 | 1,683,223 |