iree
iree copied to clipboard
[LLVMGPU] VectorDistribution pipeline for attention
This patch adds support for lowering attention through VectorDistribution pipeline. Currently, it has the following limitations which will be fixed as followups:
- Only 1 subgroup is used
- There are 4 shared memory promotions. Based on the intrinsic, this can be reduced to 2 or 3.
Depends on https://github.com/iree-org/iree/pull/17744
Please only review the last commit. Other commits are the patches this patch depends on.
I dont have any comments on this, skimming through this looks OK to me to land and iterate
Closing in favour of https://github.com/iree-org/iree/pull/17773