triton
triton copied to clipboard
[AMD] Fix shared layout order for batch dimension in pipeline passes
Batch dimension should be slowest one, other cases are not supported by MFMA/WMMA/MMA pipeline.