AMDMIGraphX
AMDMIGraphX copied to clipboard
[MLIR][Attention] Implement gemm(i8)-dequantizelinear-softmax(fp16)-gemm(fp16) lowering
Problem Description
This ticket is to implement gemm(i8)-dequantizelinear-softmax(fp16)-gemm(fp16) pattern to do a partial i8 attention kernel in rocmlir.
Here is one of the examples test we currently have working : https://github.com/ROCm/rocMLIR/blob/develop/mlir/test/fusion/pr-e2e/attention/mixr-attention-first-gemm-i8-f16.mlir
Operating System
Any
CPU
Any
GPU
AMD Instinct MI300X, AMD Instinct MI250X, AMD Instinct MI250, AMD Instinct MI210
Other
No response
ROCm Version
ROCm 6.0.0
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response