Optimize LU factorization without pivoting

Open EdDAzevedo opened this issue 1 year ago • 0 comments

Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization.

The diagonal block is factored using a specialized kernel that loads the matrix into the 64 Kbytes of LDS shared memory. Then rocblas TRSM is used to generate a column panel in "L" and row panel in "U". Then rocblas GEMM is used to update the right unfactored sub-matrix.

The new routines are named as "getf2_nopiv" and "getrf_nopiv" to make minimal changes to the existing code.

Feb 26 '24 17:02 EdDAzevedo