[LinalgExt] Added Decomposition of ExpReductionOp to LinAlg
This is part 3 of #21761 (part 2: #22316). In this PR, we:
- Add
AggregateOpInterfacetoExpReductionOp, to support decomposition. - Add
DecomposeExpReductionPass, to apply the decomposition.DecomposeExpReductionPassalso callsDecomposeMultipleResults, which reduces a linalg with multiple results to multiple linalgs with one result to more easily support DCE/CSE.
- Tests for the decomposition pass.
I'm quite concerned about the DecomposeExpReduction implementation, as it seems weird to call the greedy Rewriter in the Pass, then receive it as a Builder in AggregatedOpInterface::decomposeOp, only to turn it back into a Rewriter to do the rewriting. Would appreciate feedback if there is a more proper way to do this.
Also, is there a more idiomatic way to turn iree_linalg_ext.yield into a linalg.yield?
I'm quite concerned about the DecomposeExpReduction implementation, as it seems weird to call the greedy Rewriter in the Pass, then receive it as a Builder in AggregatedOpInterface::decomposeOp, only to turn it back into a Rewriter to do the rewriting. Would appreciate feedback if there is a more proper way to do this.
Also, is there a more idiomatic way to turn
iree_linalg_ext.yieldinto alinalg.yield?
It's okay to have a rewriter if it's only mutating the ir that you created, it's not ideal, but it's okay. I think there are better ways do inline the region you are using a rewriter for though. You can directly clone the region instead of inlining it.
I think the problem is that you may "decompose" it too early. It is not common to create linalg ops in such methods. I can't comment much unless I see e2e IRs and the plans. E.g., what is the flow for supporting such op e2e.
I may be wrong on attention context, since we are doing it for other attention ops. Note: it does not imply that it is the best route and it is outside my context.
@hanhanW I've written the DecomposeAggregatedOps pass and will write tests in a moment but perhaps it should be in another PR?
@hanhanW I've written the DecomposeAggregatedOps pass and will write tests in a moment but perhaps it should be in another PR?
The pass belongs to a separate PR, because it is generally good to avoid large PR when possible.
Tips: https://google.github.io/eng-practices/review/developer/small-cls.html
Why do we care DiscardableAttrs after decomposition? It seems like you are relying on this to do further transformation, which seems bad? What information do you need to carry after the decomposition?
From what I understand, wouldn't vectorization need the lowering config as well? It's been awhile since I had a look at the full pipeline so I might be remembering incorrectly.
Why do we care DiscardableAttrs after decomposition? It seems like you are relying on this to do further transformation, which seems bad? What information do you need to carry after the decomposition?
From what I understand, wouldn't vectorization need the lowering config as well? It's been awhile since I had a look at the full pipeline so I might be remembering incorrectly.
I see. The lowering config may be incorrect if some dimensions are dropped. I implemented vector size inference, so you don't rely on the lowering config to do vectorization.
Actually, I removed the old (and bad) behavior. The lowering config needs to implement the interface method to get the vector sizes. Today, only CPU one has the implementation.
https://github.com/iree-org/iree/blob/818e59f074ca6f62c9a06dd016a330115e4d837b/compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp#L41-L43
My concern is that we can carry the attribute, but we can't mutate it in decomposition. If you highly rely on it, there is a red flag from me. I think the vector size inference should be good enough in this case, as all the dimensions are expected to be tiled.
https://github.com/iree-org/iree/blob/818e59f074ca6f62c9a06dd016a330115e4d837b/compiler/src/iree/compiler/Codegen/Utils/Utils.h#L335-L348
(I'll do another round of review later.)