Dimension mismatch after the bubble-up-expand-shapes pass
In the MakeSingleDispatchPipeline we have tensor.pad + convs and in presence of unit dims, we drop unit dims. But then when we bubble up expand shape, we actually sink a collapse through the pad and then the collapse stays between the pad and the conv which is not supported in codegen as a single dispatch.
Link to the full IR dump
Link to the gist containing the complete IR referenced here.
For this IR
%cst = arith.constant 0.000000e+00 : bf16
%cst_0 = arith.constant 0.000000e+00 : f32
%0 = hal.tensor.import wait(%arg2) => %arg0 : !hal.buffer_view -> tensor<16x1x30x576xbf16>
%1 = hal.tensor.import wait(%arg2) => %arg1 : !hal.buffer_view -> tensor<576x1x3x576xbf16>
%collapsed = tensor.collapse_shape %0 [[0, 1], [2], [3]] : tensor<16x1x30x576xbf16> into tensor<16x30x576xbf16>
%padded = tensor.pad %collapsed low[0, 1, 0] high[0, 1, 0] {
^bb0(%arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : bf16
} : tensor<16x30x576xbf16> to tensor<16x32x576xbf16>
%collapsed_1 = tensor.collapse_shape %1 [[0, 1], [2], [3]] : tensor<576x1x3x576xbf16> into tensor<576x3x576xbf16>
%2 = tensor.empty() : tensor<16x30x576xf32>
%3 = linalg.fill ins(%cst_0 : f32) outs(%2 : tensor<16x30x576xf32>) -> tensor<16x30x576xf32>
%4 = linalg.generic {indexing_maps =
[affine_map<(d0, d1, d2, d3, d4) -> (d0, d1 + d3, d4)>,
affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d4)>,
affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2)>],
iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]} i
ns(%padded, %collapsed_1 : tensor<16x32x576xbf16>, tensor<576x3x576xbf16>) outs(%3 : tensor<16x30x576xf32>)
using the following command
iree-opt --pass-pipeline="builtin.module(func.func(iree-dispatch-creation-bubble-up-expand-shapes),canonicalize,cse)" test.mlir
Resultant IR
%padded = tensor.pad %0 low[0, 0, 1, 0] high[0, 0, 1, 0] {
^bb0(%arg4: index, %arg5: index, %arg6: index, %arg7: index):
tensor.yield %cst : bf16
} : tensor<16x1x30x576xbf16> to tensor<16x1x32x576xbf16>
%collapsed = tensor.collapse_shape %padded [[0, 1], [2], [3]] : tensor<16x1x32x576xbf16> into tensor<16x32x576xbf16>
%collapsed_1 = tensor.collapse_shape %1 [[0, 1], [2], [3]] : tensor<576x1x3x576xbf16> into tensor<576x3x576xbf16>
%2 = tensor.empty() : tensor<16x30x576xf32>
%3 = linalg.fill ins(%cst_0 : f32) outs(%2 : tensor<16x30x576xf32>) -> tensor<16x30x576xf32>
%4 = linalg.generic
{indexing_maps = [#map, #map1, #map2],
iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]}
ins(%collapsed, %collapsed_1 : tensor<16x32x576xbf16>, tensor<576x3x576xbf16>) outs(%3 : tensor<16x30x576xf32>) {
In the output IR after the pass, %padded has the extra dimension
@MaheshRavishankar and @IanWood1 do you guys have thoughts on how to solve this? I was thinking we could have an option to not sink collapseshape through pad and use it in the MakseSingleDispatch, or should some other pattern be taking care of this?
I see, the reshape can't propagate past the conv due to non-projected permutation indexing maps.
I was thinking we could have an option to not sink collapseshape through pad
I think that makes sense. But convs in general might have a problem with reshape propagation.
Also, I had a PR https://github.com/iree-org/iree/pull/20108 to not propagate these when on the edges. I think it just needs a re-review from mahesh.
I thought #20108 had API dependency concerns by Ben so we decided to not go though with it? Edit : oh I see Ben has approved it.
Ill review this on Monday. We should be able to land it overall. I think I just need a bit more clarity on all the checks.
Sounds good! Is it still useful to also have the option to not sink through the pad as we might not always have it as a edge op e.g in backward convolution cases?
Confirmed that after the PR from Ian this issue isnt happening