torch-mlir Assertion when lowering from Torch IR for `AvgPool2d` when `kernel` is a tuple of a single int

ExportedProgram for

class AvgPool2dFloatModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.ap2d = torch.nn.AvgPool2d(
            kernel_size=6,
        )

    def forward(self, x):
        return self.ap2d(x)

produces the call to AvgPool2d as torch.ops.aten.avg_pool2d.default(x, [6, 6], [6, 6]). This matches with documented behavior for kernel parameter (https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html) that states that single integer value will be used for both height, width dimension. As per documentation, the only other possible value for kernel is a tuple of 2 integers. However, tuple of single element works as well:

class AvgPool2dFloatModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.ap2d = torch.nn.AvgPool2d(
            kernel_size=(6,)
        )

    def forward(self, x):
        return self.ap2d(x)

and the ExportedProgram has the call to AvgPool2d as torch.ops.aten.avg_pool2d.default(x, [6], [6]). Note that the kernel value is not being repeated though that's what happens when executing the code in python.

This ExportedProgram causes an assertion when lowering the resulting Torch IR to Tosa/Linalg/Stablehlo as the lowerings assume that kernel is 2-elements.

So I think this can be fixed by either of the following approaches:

Match the behavior of ExportedProgram for the second scenario to match with the first one. I am not familiar with PyTorch codebase, so not sure where to make the change. If anyone knows where to start looking, I'll appreciate it.
Fix the individual lowerings but that means repeating the same logic in 3 different places.
In Torch IR before any of the lowerings (possibly when DecomposeComplexOps is called) extend the kernel param of the torch.aten.avg_pool2d op to be of correct size, so the individual lowerings don't need to be fixed.

I'm leaning towards 3 (since I don't know how to make 1 work) -- is that the correct approach? If so, which pass will be the correct place to add the logic -- AFAICT none of the existing passes seem to be doing a similar transform where the op is replaced by the same op but with different params. Should I add a new pass?

@sjarus, @vivekkhandelwal1 -- any thoughts? Thanks!

Nov 21 '24 03:11 sahas3

Hi @vivekkhandelwal1, do you have any thoughts here?

Jan 06 '25 13:01 sahas3

@sahas3 Is this issue still relevant?

Apr 22 '25 12:04 vivekkhandelwal1

Hi @vivekkhandelwal1, just verified that this is still an issue.

Torch IR from the above example:

func.func @main(%arg0: !torch.vtensor<[2,6,20,10],f32>) -> !torch.vtensor<[2,6,3,1],f32> attributes {torch.assume_strict_symbolic_shapes} {
    %none = torch.constant.none
    %true = torch.constant.bool true
    %false = torch.constant.bool false
    %int0 = torch.constant.int 0
    %int6 = torch.constant.int 6
    %0 = torch.prim.ListConstruct %int6 : (!torch.int) -> !torch.list<int>
    %1 = torch.prim.ListConstruct %int6 : (!torch.int) -> !torch.list<int>
    %2 = torch.prim.ListConstruct %int0, %int0 : (!torch.int, !torch.int) -> !torch.list<int>
    %3 = torch.aten.avg_pool2d %arg0, %0, %1, %2, %false, %true, %none : !torch.vtensor<[2,6,20,10],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.bool, !torch.none -> !torch.vtensor<[2,6,3,1],f32>
    return %3 : !torch.vtensor<[2,6,3,1],f32>
  }

Converting to linalg fails:

$> torch-mlir-opt --convert-torch-to-linalg <....>

externals/llvm-project/llvm/include/llvm/ADT/SmallVector.h:291: llvm::SmallVectorTemplateCommon::reference llvm::SmallVectorTemplateCommon<mlir::Value>::operator[](llvm::SmallVectorTemplateCommon::size_t
ype) [T = mlir::Value]: Assertion `idx < size()' failed.                                                                                                                                                                                                                 
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.                                                                                                                                                              
Stack dump:                                                                                                                                                                                                                                                              
0.      Program arguments: ./build/bin/torch-mlir-opt --convert-torch-to-linalg                                                                                    
 #0 0x00005578036b3967 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./build/bin/torch-mlir-opt+0x2031967)                                                                                                                                                        
 #1 0x00005578036b170e llvm::sys::RunSignalHandlers() (./build/bin/torch-mlir-opt+0x202f70e)                                                                                                                                                                             
 #2 0x00005578036b4015 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0                                                                                                                                                                                             
 #3 0x00007fe70bc5b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050)                                                                                                                                                                                                         
 #4 0x00007fe70bca9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76                                                                                                                                                                                         
 #5 0x00007fe70bc5afb2 raise ./signal/../sysdeps/posix/raise.c:27:6                                                                                                                                                                                                      
 #6 0x00007fe70bc45472 abort ./stdlib/abort.c:81:7                                                                                                                                                                                                                       
 #7 0x00007fe70bc45395 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007fe70bc53ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2)
 #9 0x00005578026b9be7 computeOutputTensor(mlir::Operation*, mlir::ConversionPatternRewriter&, mlir::Value, long, bool, llvm::SmallVectorImpl<long>&, llvm::SmallVectorImpl<long>&, llvm::SmallVectorImpl<long>&, llvm::SmallVectorImpl<mlir::Value>&, llvm::SmallVectorI
mpl<mlir::Value>&, mlir::Value) Pooling.cpp:0:0
#10 0x00005578026ca679 (anonymous namespace)::ConvertAtenAvgPoolOp<mlir::torch::Torch::AtenAvgPool2dOp, mlir::linalg::PoolingNchwSumOp, 2>::matchAndRewrite(mlir::torch::Torch::AtenAvgPool2dOp, mlir::torch::Torch::AtenAvgPool2dOpAdaptor, mlir::ConversionPatternRewri
ter&) const Pooling.cpp:0:0
#11 0x00005578026cb78b mlir::OpConversionPattern<mlir::torch::Torch::AtenAvgPool2dOp>::matchAndRewrite(mlir::torch::Torch::AtenAvgPool2dOp, mlir::torch::Torch::AtenAvgPool2dOpGenericAdaptor<llvm::ArrayRef<mlir::ValueRange>>, mlir::ConversionPatternRewriter&) const 
Pooling.cpp:0:0

Do you have any suggestions on which of the 3 approaches proposed above will be the best path forward or do you have a different approach in mind? Thanks!

Apr 30 '25 13:04 sahas3

If you're asking me to choose, 3 seems to be a good option.

May 01 '25 14:05 sjarus

Another alternative to option 3 is to add a canonicalization pattern for the AvgPool op to extend the kernel size and other such attributes to 1x2 from 1x1 shape. This seems to be a cleaner approach than introducing a new pass to do the same, I think.

May 02 '25 00:05 sahas3

I'm ok with either approach, since it matches documented behavior.

May 05 '25 21:05 sjarus

Hi @vivekkhandelwal1, do you have any thoughts here?

Hi @sahas3, sorry for the disagreement, but IMO the 3rd approach seems to be more of a "hack" than a "fix".

I am curious to know what the use cases are where we are diverting from the PyTorch-defined kernel values. Since PyTorch expects it to be either an integer or a tuple of integers, anything else would be wrong, and we can't expect it to generate the correct IR. Ideally, it should have errored out instead of generating an incorrect IR, which is clearly not happening here. If we really want to fix this, then we should make this error out instead of trying to handle this, which is an illegitimate case as per the PyTorch definition of the op.

May 06 '25 07:05 vivekkhandelwal1

Thanks for the feedback @vivekkhandelwal1. I've asked in the PyTorch community https://github.com/pytorch/pytorch/issues/153149 whether supporting a tuple of single int is an intended behavior and documentation needs to be enhanced to reflect that. Based on the update on that, we can proceed with the correct way forward for this issue.

May 08 '25 13:05 sahas3

Hi @vivekkhandelwal1, PyTorch devs clarified in https://github.com/pytorch/pytorch/issues/153149 that the behavior of single element tuple is same as single element for the kernel param. I also asked how to fix the exported program itself as suggested by approach 1 above but haven't heard anything back regarding that.

So, what's your preference between waiting for them to fix the exported program to match with what is produced for single element case or fix it in torch-mlir using approach 3?

Thanks!

Jun 09 '25 13:06 sahas3

Hi @vivekkhandelwal1, PyTorch devs clarified in pytorch/pytorch#153149 that the behavior of single element tuple is same as single element for the kernel param. I also asked how to fix the exported program itself as suggested by approach 1 above but haven't heard anything back regarding that.

So, what's your preference between waiting for them to fix the exported program to match with what is produced for single element case or fix it in torch-mlir using approach 3?

Thanks!

Hi @sahas3, as per your comment, does PyTorch consider both cases to be valid and execute them in the same manner?

Jun 09 '25 14:06 vivekkhandelwal1

Hi @vivekkhandelwal1, PyTorch devs clarified in pytorch/pytorch#153149 that the behavior of single element tuple is same as single element for the kernel param. I also asked how to fix the exported program itself as suggested by approach 1 above but haven't heard anything back regarding that. So, what's your preference between waiting for them to fix the exported program to match with what is produced for single element case or fix it in torch-mlir using approach 3? Thanks!

Hi @sahas3, as per your comment, does PyTorch consider both cases to be valid and execute them in the same manner?

Yes, both cases are valid and are executed in the same fashion. It's only the resulting ExportedProgram where the single element tuple value is not being repeated for [height, width] yet. Here's the screenshot from the proposed doc update:

Jun 09 '25 15:06 sahas3

Hi @vivekkhandelwal1, PyTorch devs clarified in pytorch/pytorch#153149 that the behavior of single element tuple is same as single element for the kernel param. I also asked how to fix the exported program itself as suggested by approach 1 above but haven't heard anything back regarding that. So, what's your preference between waiting for them to fix the exported program to match with what is produced for single element case or fix it in torch-mlir using approach 3? Thanks!

Hi @sahas3, as per your comment, does PyTorch consider both cases to be valid and execute them in the same manner?

Yes, both cases are valid and are executed in the same fashion. It's only the resulting ExportedProgram where the single element tuple value is not being repeated for [height, width] yet. Here's the screenshot from the proposed doc update:

Okay.

Let's consider your proposed solutions. Ideally, the first one should be done. If not, then 3rd one is not appropriate because the DecomposeComplexOps is for decomposing an op into a set of other ops, which we are not doing here. So, you've to find a different pass to do this or you have to go for the 2nd solution.

Jun 16 '25 11:06 vivekkhandelwal1

@vivekkhandelwal1, I agree that DecomposeComplexOps is not the correct pass to insert the logic to extend the params. I don't think there's an existing pass to do this either -- so I think I'll have to add a new pass. Instead I think an alternative is to add a canonicalization pattern for the pooling ops in torch dialect to canonicalize to a common form. What do you think?

Jun 16 '25 12:06 sahas3

@vivekkhandelwal1, I agree that DecomposeComplexOps is not the correct pass to insert the logic to extend the params. I don't think there's an existing pass to do this either -- so I think I'll have to add a new pass. Instead I think an alternative is to add a canonicalization pattern for the pooling ops in torch dialect to canonicalize to a common form. What do you think?

Okay. For now, that seems to be the only option.

Jun 16 '25 12:06 vivekkhandelwal1