Angelo Gonzales
Angelo Gonzales
This PR introduces a potential optimization to the LARFT routine. The modification aims to reduce the size of the gemv computations and instead offloads the block part of the computation...
This PR adds the required `--emulation=smoke/regression/extended` parameter to `rtest.py` to run emulation tests to be used on the emulator. In addition, `--name` parameter which selects subtests from a test set...
This PR aims to reduce the impact of `set_diag` and `restore_diag` kernels to the runtime of GEQR2 indicated by profiling. This is achieved by: 1. Combining `larfg` and `set_diag` to...
This adds a gemm device function which is callable in other kernels. The function is designed to be called by an entire wavefront and to compute a block of the...
Optimize larft to improve performance of downstream functions like ormtr
This PR fixes the `checkin_misc_MEMORY_MODEL.user_managed` test failure when `ROCSOLVER_USE_INTERNAL_BLAS` is set and internal trsm is used for `rocsolver_dgetrf_strided_batched`.