Kunwar Grover
Kunwar Grover
> Good to have this cleanup, but IIRC @harsh-nod mentioned there are cases where we found regular, non FA faster, so tileAndDecomposeAttention may still be useful there? In those cases,...
Already landed as part of https://github.com/iree-org/iree/commit/dd3f2a392819d121fa5329a1c591be06ae9e887a
I looked at the attention IR, it's going down the memory bound attention pipeline. The reason is that our attention/mma heuristics are not best at checking if the copy from...
> > While VectorDistribute doesn't support multiple dimensions for subgroup dims, can we try to keep the configuration logic similar to TileAndFuse? We plan to soon support that, and It...
Just to note, there is another route that we can take. The reason there are issues is that we are doing fusion greedily. Instead, we could do any analysis to...
This is an easy fix, you can just assign batch dimensions to that addition dimension. I can fix it, but happy to let someone else also try fixing it.
Me and Stanley talked offline and this should be a simple codegen change. We can just assign batch dimensions for the layout here: https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp#L1204
Fixed by https://github.com/iree-org/iree/pull/18868
Just to signal, we had some problems with this pass in gpu pipeline because it drops lowering_config from linalg operations. Maybe it doesn't apply here, but still would like to...
@jtuyls I added a ci-extra trailer so the torch ci will run on this again and retriggered the ci. I'd recommend disabling the ukernel flags (if enabled) in the ci...