Epliz
Epliz
Thanks @chesik-amd for the reply. I understand the current situation, but I hope this is not the forever situation as the ROCm profiling tools are far far behind what you...
Hi @penglz , If there was an issue, it probably has been fixed since then as collecting counters seem to work, at least for me. Only thing is that after...
also reported at https://github.com/ROCm/clr/issues/78
Hi, Not sure what is the status, but looks like AMD has been working on it: https://github.com/pytorch/pytorch/pull/114309
I updated the kernel from my reproducer, it saturates memory bandwidth (contrary to rocBLAS).
I see that @daineAMD replied to the other issue, so mentioning here as well, in case that helps in any way. To contextualize again if needed, improving rocblas_gemm_ex for cases...
Hi @daineAMD , Following up after a week. Do you have any example of a configuration where gemv is slower than gemm ? If not, can you please proceed with...
Thanks @daineAMD for the reply. I still believe that if not always dispatching those cases to the gemv kernel, dispatching for configurations known to be faster with gemv would be...
Hi @daineAMD , Thanks a lot for pushing for this, and I would also be happy to give a try as early tester, if you want and can.