xla
xla copied to clipboard
[XLA:GPU] Support fusion of dynamic-slice into triton gemm.
[XLA:GPU] Support fusion of dynamic-slice into triton gemm.
This change fuses dynamic-slice into triton gemm kernels provided the slice is taken along the major-most dimension, leaving all other dimensions the same. No further fusion occurs in operands of the dynamic slice, meaning the resulting triton gemm must take in all operands of the dynamic slice as parameters.
Autotuning can handle dynamic-slice just fine, because of the instruction's semantics, which ensures that it never reads out of bounds (the offsets are clamped to a valid region).
The original author of this CL is jvstokes and I modified it a bit.