rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

Remove loop unrolls from getri and trtri

Open tfalders opened this issue 1 year ago • 1 comments

This PR removes the loop unrolls from the small-size getri and trtri kernels, reducing the library size and allowing them to be built even if the -n flag is passed to the install script. My tests on gfx90a show a minimal impact on performance, though I wasn't able to run the batch tests due to a hang.

tfalders avatar Jun 14 '24 20:06 tfalders

Unfortunately, even with these changes the debug build is still exceeding the maximum binary size. I might look into removing the loop unrolls from getf2_npvt as well.

tfalders avatar Jul 23 '24 17:07 tfalders

Closing until new update can be tested

tfalders avatar Oct 07 '24 20:10 tfalders