iree icon indicating copy to clipboard operation
iree copied to clipboard

GPUMaterializeEncoding: expand-to-subgroups in both M and N dimensions

Open bjacob opened this issue 4 months ago • 0 comments

The current tile-selection heuristic in GPUMaterializeEncoding only ever expands to subgroups in the N dimension, never in the M dimension. That allows to keep this logic a little simpler, but shouldn't be optimal in general, since the resulting rectangular shape of the kernel at the thread level means a missed opportunity to minimize the number of load instructions. Concretely, we are currently generating 10 load instructions where just 8 would be generated if we expanded to subgroups in both M and N (with factor of 2 each) instead of all in the N dimension (with factor of 4).

This is prototyped in this stand-alone HIP kernel: https://github.com/bjacob/hip-matmul/commit/79847ab5b052ff0c8a29df730cc7e2891bf30ec5

bjacob avatar Oct 21 '24 14:10 bjacob