iree icon indicating copy to clipboard operation
iree copied to clipboard

[LLVMGPU] Use forall workgroup distribution in TileAndFuse pipeline

Open Max191 opened this issue 5 months ago • 2 comments

This switches the TileAndFuse pipeline to use scf.forall distribution. Using scf.forall distribution also requires some changes to the pass ordering in the TileAndFuse pipeline, which is also handled by this PR:

  1. The main difference is that PackToIntrinsics happens before workgroup distribution. Otherwise, collapse_shape ops can end up at the end of the workgroup forall, and an extra buffer is created.
  2. Pack decomposition is now staged, with packs/unpacks at the function boundaries being decomposed early before workgroup decomposition, and the rest being decomposed after reduction tiling as before. This prevents unpacks being fused into the workgroup forall and causing the same problem as in (1).
  3. createConcretizeMmaShapes now runs before workgroup tiling as well, so the resulting collapse_shape on the multi_mma op result can be propagated to the function boundary before any tiling. This is also to avoid the same problem as in (1).

The lowering configs on the MMA path have also changed, since they now need to account for inner tile sizes of packing.

depends on https://github.com/iree-org/iree/pull/18852

Max191 avatar Sep 20 '24 15:09 Max191