AMDMIGraphX
AMDMIGraphX copied to clipboard
Workgroup Reversal
Add functionality to apply workgroup reversals to increase cache hits.
Initial work resulted in no perf difference. Rocprofiler results on trimmed unet:
- using
MIGRAPHX_MLIR_USE_SPECIFIC_OPS=attention- Cache hits are mostly the same with and without reversal (with some being considerably lower) - using `MIGRAPHX_MLIR_USE_SPECIFIC_OPS=convolution,dot,fused,attention' - Cache hits are noticeably higher for some kernels with reversals, but overall perf is consistently worse with reversals
Next Steps: Understand cache hits with even smaller graphs
- Performed test with
mul -> dot -> addprogram which is compiled asmul -> dot_addwhere mlir_dot_add is reverse indexed when reversal is applied. There is no change in cache hits when reversal is applied