iree
iree copied to clipboard
[gfx950][mxfp4] Verify the state of current heuristics
Use amdsharktuner to collect performance data on the effect of knobs such as workgroup thread count, subgroup count, tile size, etc. on the best performance at various shapes of interest. This will help us verify the reliability of our existing heuristics. The intention is to compare it to the performance obtained when copying the configs of a handwritten assembly kernel and note whether we can do better.
M, N, K/2, K/32
512,1024,8192,512
512,16384,8192,512
512,53248,8192,512
1024,16384,8192,512
1024,1024,8192,512
1024,53248,8192,512
2048,1024,8192,512
2048,16384,8192,512
2048,53248,8192,512
512,16384,26624,1664
1024,16384,26624,1664
2048,16384,26624,1664