iree
iree copied to clipboard
[LLVMGPU] Embed mma_intrinsic in to_layout and infer contraction's intrinsic from it.
To enable faster flash attention, we'd like to be able to force different vector widths => we'd like different contraction to potentially have different intrinsics. This PR introduces a way to set intrinsic information for individual contraction, and have it preserved until vector distribution.