rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

Flexible block size for SYEVJ/HEEVJ

Open EdDAzevedo opened this issue 1 year ago • 0 comments

The kernels (and launch configurations) are modified to accept a block size nb_max that is not equal to BS2=32.

For double_complex type, the nb_max can be set to 22 to allow (2 * nb_max) by (2 * nb_max) "J" rotation matrix and (2 * nb_max) by (nb_max) submatrix of "A" matrix to be stored in LDS cache for use in offd_rotate_kernel(). This can allow allow the (2 * nb_max) by (2 * nb_max) "J" rotation matrix and (2 * nb_max) by (2 * nb_max) submatrix of "A" be stored in LDS shared memory for syevj_offd_kernel().

This reduced block size may improve the performance of HEEVJ for double complex.

EdDAzevedo avatar Nov 26 '24 03:11 EdDAzevedo