RAJAPerf icon indicating copy to clipboard operation
RAJAPerf copied to clipboard

hip mfma tests

Open CRobeck opened this issue 3 years ago • 1 comments

This PR adds basic functionality test of leveraging the matrix cores on AMD gfx908 and gfx90a hardware for dense matrix products.

CRobeck avatar May 20 '22 17:05 CRobeck

This is looking much better. The main thing to do now is to convert it to run in parallel on the gpu. I think its fine if what each thread does and the block size is different between the different tunings, as long as they're still similar enough to think of as different tunings of the same algorithm.

MrBurmark avatar Jun 01 '22 15:06 MrBurmark