iree icon indicating copy to clipboard operation
iree copied to clipboard

[Codegen] Account for pre-distributed computation in workgroup tiling

Open Max191 opened this issue 6 months ago • 1 comments

There are optimizations in TileAndDistributeToWorkgroupsUsingForallOpPass that try to fold away unit trip distribution loops, but this is not valid when there are preexisting workgroup forall loops in the IR, because the other forall may ask for multiple workers. This PR prevents the optimizations from fully folding the distribution loop when there is other workgroup distribution present in the dispatch.

Max191 avatar Jun 19 '25 14:06 Max191

You maybe can move that canonicalization outside of the main tiling methods, but more than that I think this is kind of hinting at issues with the loose/implicit link between different workgroup distributions.

Make the workgroup distribution distribute only along x-dimension, i.e. fold all the logical dimensions into a single physical dimension, then you can use delinearize to get back the different dimensions (AFAIK this should be very easy to change cause this is what happens to map a 4D logical distribution onto a 3D distribution. Then you wont care about the unit dims. The x/y/z dimensions have no basis actually in hardware. its all just a list.

MaheshRavishankar avatar Jun 19 '25 20:06 MaheshRavishankar

I think this is kind of hinting at issues with the loose/implicit link between different workgroup distributions.

Yeah, we need to be careful about folding away workgroup loops. Folding away any workgroup loop loses some information about the fact that the contained computation is distributed to a single workgroup. In this case, it is easy to tell that folding the loop will be invalid, but if the order of transformations were reversed (do workgroup tiling and fold the forall loop, and then introduce a separate forall later), then it would be very hard to tell that any of the dispatch is already distributed.

I didn't want to interfere with the current default path too much in this PR, but now that we aren't assuming only a single scf.forall when we resolve workgroup distribution, I think we really shouldn't ever be fully folding away workgroup loops.

Make the workgroup distribution distribute only along x-dimension, i.e. fold all the logical dimensions into a single physical dimension, then you can use delinearize to get back the different dimensions (AFAIK this should be very easy to change cause this is what happens to map a 4D logical distribution onto a 3D distribution. Then you wont care about the unit dims. The x/y/z dimensions have no basis actually in hardware. its all just a list.

This seems like the job of ReconcileTranslationInfo, no? I think the point of folding away the unit dims was to open op additional optimizations by having more static information in the IR. Linearizing and delinearizing like this seems unnecessary at this phase of codegen. We just need to be careful about folding away forall loops in general IMO.

Max191 avatar Jun 23 '25 13:06 Max191

@MaheshRavishankar after rebasing, the new test you added was failing, which I fixed by adjusting the lowering config that had tile sizes larger than the problem size (see latest commit). I assume this was a fine change to make, but I'll wait for your confirmation.

Max191 avatar Jul 07 '25 14:07 Max191