AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Workgroup Reversal

Open shivadbhavsar opened this issue 1 year ago • 2 comments

Add functionality to apply workgroup reversals to increase cache hits.

shivadbhavsar avatar Mar 06 '24 19:03 shivadbhavsar

Initial work resulted in no perf difference. Rocprofiler results on trimmed unet:

  1. using MIGRAPHX_MLIR_USE_SPECIFIC_OPS=attention - Cache hits are mostly the same with and without reversal (with some being considerably lower)
  2. using `MIGRAPHX_MLIR_USE_SPECIFIC_OPS=convolution,dot,fused,attention' - Cache hits are noticeably higher for some kernels with reversals, but overall perf is consistently worse with reversals

shivadbhavsar avatar Apr 03 '24 17:04 shivadbhavsar

Next Steps: Understand cache hits with even smaller graphs

  1. Performed test with mul -> dot -> add program which is compiled as mul -> dot_add where mlir_dot_add is reverse indexed when reversal is applied. There is no change in cache hits when reversal is applied

shivadbhavsar avatar Apr 03 '24 17:04 shivadbhavsar