rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

(WIP) sytrd use gemv instead of symv

Open EdDAzevedo opened this issue 10 months ago • 0 comments

Code modifications to sytrd() and latrd() to use gemv() (general matrix vector multiply) instead of symv() (symmetric matrix vector multiply).

In some implementations and problem sizes, gemv() may give higher performance compared to symv(), even though symv() should perform only half the work and touch about half the data. The implementation of symv() might also be using atomic update operations.

The changes include:

  1. allocating more work storage in sytrd() to store the conceptually untouched strictly upper triangular or strictly lower triangular part.
  2. invoke kernels in sytrd() to save (on entry) and restore (on exit) the triangular parts.
  3. invoke kernels in latrd() (on entry) to copy the strictly lower triangular part or strictly upper triangular part of matrix to enforce symmetry. This is to allow gemv() to replace calls to symv().
  4. modified xxTRD_BLOCKSIZE from 32 to 64 to reduce the cost for matrix copy for enforcing matrix symmetry in latrd()

On gfx1030, using rocsolver-bench -f sytrd --precision s --iters 5

n using gemv (us) using symv (us)
1024 62,201 73,580
2048 137,507 190,302
4096 335,996 522,818
8192 2,161,237 1,925,926

On MI300 (splinter-126-wr-d3, gfx942)

n using gemv (us) using symv (us)
1024 40,310 53,154
2048 94,027 157,111
4096 237,926 484,525
8192 689,551 1,683,223

EdDAzevedo avatar Feb 18 '25 20:02 EdDAzevedo