rocSOLVER
rocSOLVER copied to clipboard
Optimize LU factorization without pivoting
Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization.
The diagonal block is factored using a specialized kernel that loads the matrix into the 64 Kbytes of LDS shared memory. Then rocblas TRSM is used to generate a column panel in "L" and row panel in "U". Then rocblas GEMM is used to update the right unfactored sub-matrix.
The new routines are named as "getf2_nopiv" and "getrf_nopiv" to make minimal changes to the existing code.