rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

Potential improvement to set/restore_diag in GEQR2

Open AGonzales-amd opened this issue 1 year ago • 0 comments

This PR aims to reduce the impact of set_diag and restore_diag kernels to the runtime of GEQR2 indicated by profiling. This is achieved by:

  1. Combining larfg and set_diag to reduce the number of global memory reads and writes:
    • This is achieved by modifying larfg to write both the unit diagonal and non-unit diagonal values thus eliminating the call to set_diag.
  2. Reduce kernel launch overhead of set_diag and restore_diag:
    • set_diag is explained above. Launch overhead of restore_diag is reduced by launching the kernel once to restore all diagonal values at the expense of additional memory footprint.

The following chart shows the speedup of geqrf with these changes on real single precision square matrices. log_compare_sgeqrf_m

Note:

  • I tried the suggestion of using larfb instead of larf but it performed worse due to increased global memory access. I got similar results with my attempt to modify larf to assume implicit unit diagonal.
  • This is my attempt of a solution to this problem and I am open to try other suggestions.

AGonzales-amd avatar Sep 30 '24 22:09 AGonzales-amd