[DataTiling][Codegen] Improve layout transformation codegen
This is a tracking issue for increasing layout transformation codegen support and quality. A proposal of the work to be done is currently laid out in an RFC here: https://hackmd.io/@mdawkins/rJ-2UUEAJe
This issue will be filled out with more detailed implementation steps after the RFC has been discussed and the plan for relayout codegen is clear.
Update on layout transformation codegen work:
Some of the work proposed in the RFC above is now complete:
- Add "early bufferization" ops and transformation: https://github.com/iree-org/iree/pull/20619, https://github.com/iree-org/iree/pull/20626, and https://github.com/iree-org/iree/pull/20627
- Add map_scatter op: https://github.com/iree-org/iree/pull/20640, https://github.com/iree-org/iree/pull/20688
- Combine relayout ops into a map_scatter op: https://github.com/iree-org/iree/pull/20655
The current working prototype is on a branch: https://github.com/iree-org/iree/tree/users/Max191/combine-relayout-ops-e2e-wip
There are a few small patches on the branch that need to land to bring things together e2e. The next steps after landing the work on the above branch are laid out below (ordering is subject to change):
- [x] Support padding in the
CombineLayoutTransformationPass.- https://github.com/iree-org/iree/pull/20797
- https://github.com/iree-org/iree/pull/20848
- [x] Improve heuristics for pad distribution tiling sizes in CombineLayoutTransformation. Possibly through lowering_config attributes.
- [x] Turn on the CombineLayoutTransformation pass by default on the TileAndFuse pipeline.
- [x] Test more complicated data tiling fusions, like
multi_mma->unset_encoding->set_encoding. - [x] Support the map_scatter op on all active LLVMGPU pipelines (
VectorDistribute,WarpReduction(?) ).- Need to implement Vectorization and vector distribution patterns for VectorDistribute.
- [ ] Add e2e tests for new data tiling fusion cases.
- [x] Enable map_scatter e2e tests on LLVMGPU
- https://github.com/iree-org/iree/pull/21034#issuecomment-2954343152
- [x] Implement vectorization for the map_scatter op. This will enable vectorized stores when possible, and prevent remaining small private allocations.
- Prerequisite task is to decide how we want to vectorize the op.
- [x] Support consumer fusions (requires some propagation of either encodings or relayout ops)
- https://github.com/iree-org/iree/pull/20898
- https://github.com/iree-org/iree/pull/20901
- [ ] Support consumer fusions for consumers with multiple operands.
- https://github.com/iree-org/iree/issues/20943
(CC myself for notification)
Reposting from discord: https://discord.com/channels/689900678990135345/1254843174111678555/1370438822374015158
I have a WIP branch with support for padding in CombineLayoutTransformation, and it is able to compile and run chained matmuls with correct numerics using data tiling!
Here's the branch and the chained matmul IR that I'm working with.
The way I'm distributing the padding on the branch is still a bit hacky, so I need to work out a better way of deciding how to distribute, but this first prototype is promising.
After I'm able to land this work, the data tiling fusion work will start to intersect with some of the propagation work, because we need to handle reshapes between dispatches, and consumers on the unset_encoding ops.
What is the status of removing iree-llvmgpu-test-combine-layout-transformation flag? I got few questions about that offline, and it'd be good to remove the flag if it is no longer needed.
What is the status of removing
iree-llvmgpu-test-combine-layout-transformationflag? I got few questions about that offline, and it'd be good to remove the flag if it is no longer needed.
I think the main blocker was removing the WarpReduction pipeline, which was recently done: https://github.com/iree-org/iree/pull/21821
I'll send out a draft PR and see if there are any issues.
Can you update https://github.com/iree-org/iree/issues/20530#issuecomment-2852302024? Either mark them resolved or say that it is no longer needed.