torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

Problem with transferring conv2d from torch to torch backend if i set use_tracing=True.

Open Yepgang opened this issue 2 years ago • 1 comments

I constructed the struct, weights and input of conv2d and pretend to compile it with torch-mlir.But there has a problem if i set use_tracing=True .Then i debug it with torch-mlir-opt.It seems tensor_static_info_cast do something that I don't understand. I would like to know why using trace to manipulate the model fails but the script succeeds. The code and error are shown below.

code

import torch
import torch.nn as nn
import torch_mlir

class convtest(nn.Module):
    def __init__(self, n_rnn=2, leakyRelu=False):
        super(convtest, self).__init__()      
        self.cnn = nn.Conv2d(1, 64, 3, 1, 1) 

    def forward(self, input):
        conv = self.cnn(input)
        return conv

input = torch.ones(1, 1, 32, 100)
model = convtest()

weight = torch.ones(64, 1, 3, 3)
bias = torch.zeros(64)
for item in model.modules():
    if isinstance(item,nn.Conv2d):
        item.weight = nn.Parameter(weight)
        item.bias = nn.Parameter(bias)

module = torch_mlir.compile(model, input, output_type=torch_mlir.OutputType.TOSA, use_tracing=True)

print("convert-to-tosa")
asm = module.operation.get_asm(
        large_elements_limit=10, enable_debug_info=True)
filename = "./crnn_tosa.mlir"
with open(filename, 'w') as f:
    f.write(asm)
print("write tosa mlir to %s" % (filename))

error

Traceback (most recent call last):
  File "/host/workspace/torch2tosa_examples/crnn2tosa/con2dtest.py", line 32, in <module>
    module = torch_mlir.compile(model, input, output_type=torch_mlir.OutputType.TOSA, use_tracing=True)
  File "/host/workspace/MTensorRT/third_party/torch-mlir/buildpurec/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/__init__.py", line 157, in compile
    run_pipeline_with_repro_report(
  File "/host/workspace/MTensorRT/third_party/torch-mlir/buildpurec/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/compiler_utils.py", line 49, in run_pipeline_with_repro_report
    raise Exception(f"""
Exception: 
Lowering Torch Backend IR -> TOSA Backend IR failed with the following diagnostics:
error: unsupported by backend lowering: tensor with unknown rank or dtype
note: see current operation: %6 = "torch.tensor_static_info_cast"(%arg0) : (!torch.vtensor<[1,1,32,100],f32>) -> !torch.vtensor<*,f32>
note: this is likely due to a missing shape transfer function in shape_lib_gen.py
Error can be reproduced with:
$ torch-mlir-opt -pass-pipeline='torch-backend-to-tosa-backend-pipeline' /tmp/convtest.mlir
Add '-print-ir-after-all -mlir-disable-threading' to get the IR dump for debugging purpose.

torch-mlir-opt debug results

<unknown>:0: error: unsupported by backend lowering: tensor with unknown rank or dtype
<unknown>:0: note: see current operation: %6 = "torch.tensor_static_info_cast"(%arg0) : (!torch.vtensor<[1,1,32,100],f32>) -> !torch.vtensor<*,f32>
<unknown>:0: note: this is likely due to a missing shape transfer function in shape_lib_gen.py
// -----// IR Dump After VerifyInvariantsBeforeBackendLowering Failed //----- //
module attributes {torch.debug_module_name = "convtest"} {
  func.func @forward(%arg0: !torch.vtensor<[1,1,32,100],f32>) -> !torch.vtensor<[64,32,100],f32> {
    %0 = torch.vtensor.literal(dense<1.000000e+00> : tensor<64x1x3x3xf32>) : !torch.vtensor<[64,1,3,3],f32>
    %1 = torch.vtensor.literal(dense<0.000000e+00> : tensor<64xf32>) : !torch.vtensor<[64],f32>
    %int0 = torch.constant.int 0
    %int1 = torch.constant.int 1
    %false = torch.constant.bool false
    %true = torch.constant.bool true
    %2 = torch.tensor_static_info_cast %arg0 : !torch.vtensor<[1,1,32,100],f32> to !torch.vtensor<*,f32>
    %3 = torch.copy.to_tensor %2 : !torch.tensor<*,f32>
    %4 = torch.tensor_static_info_cast %3 : !torch.tensor<*,f32> to !torch.tensor<[1,32,100],f32>
    %5 = torch.copy.to_tensor %1 : !torch.tensor<[64],f32>
    %6 = torch.copy.to_tensor %0 : !torch.tensor<[64,1,3,3],f32>
    %7 = torch.aten.unsqueeze %4, %int0 : !torch.tensor<[1,32,100],f32>, !torch.int -> !torch.tensor<[1,1,32,100],f32>
    %8 = torch.prim.ListConstruct %int1, %int1 : (!torch.int, !torch.int) -> !torch.list<int>
    %9 = torch.prim.ListConstruct %int0, %int0 : (!torch.int, !torch.int) -> !torch.list<int>
    %10 = torch.operator "aten._convolution"(%7, %6, %5, %8, %8, %8, %false, %9, %int1, %false, %false, %true, %true) : (!torch.tensor<[1,1,32,100],f32>, !torch.tensor<[64,1,3,3],f32>, !torch.tensor<[64],f32>, !torch.list<int>, !torch.list<int>, !torch.list<int>, !torch.bool, !torch.list<int>, !torch.int, !torch.bool, !torch.bool, !torch.bool, !torch.bool) -> !torch.tensor<[1,64,32,100],f32>
    %11 = torch.aten.squeeze.dim %10, %int0 : !torch.tensor<[1,64,32,100],f32>, !torch.int -> !torch.tensor<[64,32,100],f32>
    %12 = torch.copy.to_vtensor %11 : !torch.vtensor<[64,32,100],f32>
    return %12 : !torch.vtensor<[64,32,100],f32>
  }
}

Yepgang avatar Jun 28 '22 06:06 Yepgang

This should be fixed after https://github.com/llvm/torch-mlir/pull/956

silvasean avatar Jun 29 '22 23:06 silvasean

Closing as we support aten._convolution after #956

silvasean avatar Oct 07 '22 13:10 silvasean