rocSOLVER icon indicating copy to clipboard operation
rocSOLVER copied to clipboard

Add GEMM device function

Open AGonzales-amd opened this issue 11 months ago • 4 comments

This adds a gemm device function which is callable in other kernels. The function is designed to be called by an entire wavefront and to compute a block of the output matrix. Therefore, problems can be decomposed into chunks that are operated on by individual wavefronts. Currently, it is used to implement an alternative rocsolver_gemm kernel.

  • Uses mfma instructions
  • Limited to __gfx90a__, __gfx940__, __gfx941__, and __gfx942__ architectures
  • Supports both complex data types and transpose matrix operations

AGonzales-amd avatar Jan 13 '25 17:01 AGonzales-amd

Hi @AGonzales-amd, I second Ed's suggestion: you should update rocsolver-test and -bench clients to support the internal gemm. I can help you with the tests if you want (this would also be a good opportunity to provide a concrete answer to the question you asked in #879).

jmachado-amd avatar Jan 16 '25 21:01 jmachado-amd

Hi @AGonzales-amd, I second Ed's suggestion: you should update rocsolver-test and -bench clients to support the internal gemm. I can help you with the tests if you want (this would also be a good opportunity to provide a concrete answer to the question you asked in #879).

Thanks @jmachado-amd, I could use your help. I did consider updating the client programs but I had trouble exporting the function or making it visible in the clients.

AGonzales-amd avatar Jan 16 '25 23:01 AGonzales-amd

Hi @jmachado-amd and @EdDAzevedo, the client programs have been updated to support internal gemm. One thing I'm not sure about is the tolerance for error checking.

AGonzales-amd avatar Jan 22 '25 22:01 AGonzales-amd

Hi @AGonzales-amd, as long as the input matrices are "small" to "medium" sized and have positive, relatively small integer entries, the current test tolerance will work just fine! Let me know if you want to generalize the tests or just understand how those bounds work, and I can explain the important bits of the theory to you.

On another topic, I see that there are many gemm tests failing in Windows, you probably want to have a look at that sooner rather than later.

jmachado-amd avatar Jan 24 '25 17:01 jmachado-amd

Test failure in SYGVDX likely unrelated. Forcing the merge.

tfalders avatar May 27 '25 20:05 tfalders