RAJAPerf
RAJAPerf copied to clipboard
Add matrix cores test to perf suite
AMD gfx90a architecture has built in hardware support (matrix cores) for dense matrix operations. The two that are of interest to RAJA are: __builtin_amdgcn_mfma_f64_4x4x4f64 __builtin_amdgcn_mfma_f64_16x16x4f64
Done criteria:
- Add a test to the perf suite that makes use of each of these intrinsics
- Confirm performance compared to hand coded matrix matrix multiply