iree icon indicating copy to clipboard operation
iree copied to clipboard

[GPU][Codegen] Expand iteration space based on new `expand_dims` attribute

Open efric opened this issue 2 months ago • 2 comments

This patch introduces iteration space expansion for reductions in the VectorDistribute path.

Specifically, we:

  1. Add a new attribute, expand_dims, for reductions.
  2. Introduce a new pass, GPUExpandDimensions, which uses expand_dims to expand the iteration space of relevant dimensions.
  3. Refactor common functionality shared between GPUExpandDimensions and BlockDynamicDimensions into reusable utilities.
  4. Refactor encoding helpers from EncodingAttrs.cpp into reusable utilities.

This change also enables chain FMA in matvec codegen as we iterate along the K reduction dimension.


Performance Summary

IREE benchmark module

  • Only expansion: ~4% improvement
  • Expansion + chain FMA: ~11% improvement

rocprof

  • Only expansion: ~13% worse
  • Expansion + chain FMA: ~9% better

Register usage

  • 10% reduction (60 → 54 registers for matvec dispatches)

Instruction latency (post-reduction loop epilogue)

  • 3.5% improvement (340 → 328 total mean latency)

Notes

  • As a follow-up, we can explore applying iteration space expansion to the reduction in attention
  • Right now, we only expand one dimension into two although the implementation supports expansion to N dimensions.
  • Please note this PR changes the reduction order, some expect some minor changes to the numerics
  • This is does not improve performance by itself/can cause regression without chain FMA https://github.com/iree-org/iree/pull/21855

Traces for matvec dispatches are attached for all variations (original, only expansion, and expansion + chain FMA).

115_expansion_and_chain.tar.gz 115_nothing.tar.gz 115_only_expansion.tar.gz

Fixes: #22153

efric avatar Oct 17 '25 00:10 efric

I’ve included all changes in this PR for now to show everything together. I can split the refactor into a separate NFC for readability if preferred and folks agree on its destination.

efric avatar Oct 31 '25 20:10 efric

NFC bits have been factored out in a separate PR for convenience in reviewing.

efric avatar Nov 04 '25 00:11 efric

@efric I added a ci-extra trailer to run test_torch can you check?

Groverkss avatar Dec 18 '25 17:12 Groverkss