rocSOLVER
rocSOLVER copied to clipboard
Flexible block size for SYEVJ/HEEVJ
The kernels (and launch configurations) are modified to accept a block size nb_max that is not equal to BS2=32.
For double_complex type, the nb_max can be set to 22 to allow (2 * nb_max) by (2 * nb_max) "J" rotation matrix and (2 * nb_max) by (nb_max) submatrix of "A" matrix to be stored in LDS cache for use in offd_rotate_kernel(). This can allow allow the (2 * nb_max) by (2 * nb_max) "J" rotation matrix and (2 * nb_max) by (2 * nb_max) submatrix of "A" be stored in LDS shared memory for syevj_offd_kernel().
This reduced block size may improve the performance of HEEVJ for double complex.