Low performance of xSY/HEGST
sy/hegst significantly slower in rocSOLVER compared to cuSOLVER
(Tested ROCsolver 3.26.2)
| size | Performance (GF/s) | ||
|---|---|---|---|
| GH200 | MI250x | MI300a | |
| 1024 (typical size we use) | 1270 | 29 | 24 |
| 10240 | 16000 | 1750 | 1613 |
@saadrahim
Hi @rasolca. Internal ticket has been created for investigation. Thanks!
Hi @rasolca, can you provide any reproducer or sample workload that you are using to compare the performance?
We use itype = rocblas_eform_ax and both uplo = rocblas_fill_lower and uplo = rocblas_fill_upper.
I used https://github.com/eth-cscs/DLA-Future miniapp_gen_to_std with parameters (--matrix-size <size> --block-size <size>) that fallback to a single lapack/cusolver/rocsolver call.
Anyway as long as the matrix are valid inputs matrix elements has no impact on performance.
This issue has been migrated to: https://github.com/ROCm/rocm-libraries/issues/1676
Imported to ROCm/rocm-libraries