rocSOLVER
rocSOLVER copied to clipboard
Remove loop unrolls from getri and trtri
This PR removes the loop unrolls from the small-size getri and trtri kernels, reducing the library size and allowing them to be built even if the -n flag is passed to the install script. My tests on gfx90a show a minimal impact on performance, though I wasn't able to run the batch tests due to a hang.
Unfortunately, even with these changes the debug build is still exceeding the maximum binary size. I might look into removing the loop unrolls from getf2_npvt as well.
Closing until new update can be tested