AnyNet torch.onnx.export RuntimeError: Unsupported: ONNX export of Pad in opset 9. The sizes of the padding must be constant.

Environment: GTX 3090 torch1.7.0+cu110 torchvision0.8.1 python3.8

File: if name == 'main': # main() img_l = torch.randn(1,3,512,256).cuda() img_r = torch.randn(1,3,512,256).cuda() f = r"../results/pretrained_anynet/checkpoint.tar" checkpoint = torch.load(f) checkpoint['state_dict'] = proc_nodes_module(checkpoint, 'state_dict') model = models.anynet.AnyNet(args).cuda() model.load_state_dict(checkpoint['state_dict']) model.eval() torch.onnx.export(model, (img_l, img_r), "anynet.onnx", verbose=False, input_names=["img_l","img_r"], output_names=["stage1","stage2","stage3"], opset_version=11)

Problem: RuntimeError: Unsupported: ONNX export of Pad in opset 9. The sizes of the padding must be constant. Please try opset version 11. I have set opset_version 11, but it doesn't work at all. Does anybody know how to solve this problem? I really need help, please.

Apr 16 '21 10:04 Jianqiang-Mei

Environment: GTX 3090 torch1.7.0+cu110 torchvision0.8.1 python3.8

File: if name == 'main': # main() img_l = torch.randn(1,3,512,256).cuda() img_r = torch.randn(1,3,512,256).cuda() f = r"../results/pretrained_anynet/checkpoint.tar" checkpoint = torch.load(f) checkpoint['state_dict'] = proc_nodes_module(checkpoint, 'state_dict') model = models.anynet.AnyNet(args).cuda() model.load_state_dict(checkpoint['state_dict']) model.eval() torch.onnx.export(model, (img_l, img_r), "anynet.onnx", verbose=False, input_names=["img_l","img_r"], output_names=["stage1","stage2","stage3"], opset_version=11)

Problem: RuntimeError: Unsupported: ONNX export of Pad in opset 9. The sizes of the padding must be constant. Please try opset version 11. I have set opset_version 11, but it doesn't work at all. Does anybody know how to solve this problem? I really need help, please.

if the shape of your dummy input is different from the original shape of the input (B, F, 368, 1232), I guess you better set " dynamic_axes " in function " export " or keep the same input shape.

alright, my problem:

I was trying to export this network as "anynet.onnx" just like you but have not succeeded yet. Have you solved this problem?

May 17 '22 08:05 tony-laoshi

Hi, @tony-laoshi. I use images from the evaluation, but the Floating point exception (core dumped) exception was raised. So I want to ask you whether you have finished exporting.

Jun 16 '22 01:06 hebangwen

Hi, @tony-laoshi. I use images from the evaluation, but the Floating point exception (core dumped) exception was raised. So I want to ask you whether you have finished exporting.

After debugging line by line, I finally find the reason for the error. In line 106 of AnyNet.py, the slice :i will select nothing, if i==0 stands, resulting in the wrong shape inference when exporting to ONNX. Below is my screenshot of gdb:

It conforms the top of the backtrace, ComputeShapeFromReshape, shown above.

My solution is to insert a condition before line 106 of anynet.py and require it to execute when i>0 stands. You can also check my fork.

Jun 17 '22 12:06 hebangwen

Hi, @tony-laoshi. I use images from the evaluation, but the Floating point exception (core dumped) exception was raised. So I want to ask you whether you have finished exporting.

After debugging line by line, I finally find the reason for the error. In line 106 of AnyNet.py, the slice :i will select nothing, if i==0 stands, resulting in the wrong shape inference when exporting to ONNX. Below is my screenshot of gdb:

It conforms the top of the backtrace, ComputeShapeFromReshape, shown above.

My solution is to insert a condition before line 106 of anynet.py and require it to execute when i>0 stands. You can also check my fork.

Thanks a lot, mate !!

Following your instructions, I am able to export the model as well. However, the inference result is not ideal, and I am still looking for the reason. Have you tested using the onnx model for inference? How's the effect？

Jun 20 '22 09:06 tony-laoshi

@tony-laoshi I haven't tested the exported ONNX model. In my implementation, I ignore the SPN module, which requires CUDA and is incompatible with mobile SoC. If you compare the original AnyNet with SPN and my implementation without SPN, the results should be kind of different.

Jun 20 '22 13:06 hebangwen

@BangwenHe I performed an inference test with the exported model. The inference engine I used is ONNX Runtime. The tests were performed in python and C++.

The results in Python are good, with the disparity accuracy increasing significantly in each of the three stages. The results in C++ are strange and arguably misestimated.

I put the test scripts I used in this folder, which also contains the model I exported and the images I used for testing.

Here you can see the results of two tests

Jun 24 '22 08:06 tony-laoshi

I solved this bug. The bug is in the post-processing of convert to cv::Mat after getting mutable data. I found a similar https://github.com/microsoft/onnxruntime/issues/11677#issuecomment-1143048647, and successfully solved the bug under his answer. At the same time, I also uploaded my updated test demo and test results.

Jun 30 '22 02:06 tony-laoshi

@tony-laoshi Happy to see this problem can be solved under your hard work. I think it will be a good help for the latecomers to solve this problem.

Jun 30 '22 02:06 hebangwen

Hi, @tony-laoshi. I use images from the evaluation, but the Floating point exception (core dumped) exception was raised. So I want to ask you whether you have finished exporting.

After debugging line by line, I finally find the reason for the error. In line 106 of AnyNet.py, the slice :i will select nothing, if i==0 stands, resulting in the wrong shape inference when exporting to ONNX. Below is my screenshot of gdb:

It conforms the top of the backtrace, ComputeShapeFromReshape, shown above.

My solution is to insert a condition before line 106 of anynet.py and require it to execute when i>0 stands. You can also check my fork.

I have added some new features to AnyNet such as confidence estimation. Now the model is trained and the test is passed. When I export the model using onnx.expport, I met the problem Floating point exception (core dumped) ....... How did you debug it? if possible and acceptable， could we communicate it by email or wechat ？

Sep 15 '22 10:09 tony-laoshi

Hi, @tony-laoshi . My debug method is returning the result after every single instruction. For example, in anynet.py:

def forward(self, left, right):

    img_size = left.size()

    feats_l = self.feature_extraction(left)
    feats_r = self.feature_extraction(right)
    pred = []
    for scale in range(len(feats_l)):
        if scale > 0:
            wflow = F.upsample(pred[scale-1], (feats_l[scale].size(2), feats_l[scale].size(3)),
                               mode='bilinear') * feats_l[scale].size(2) / img_size[2]
            cost = self._build_volume_2d3(feats_l[scale], feats_r[scale],
                                     self.maxdisplist[scale], wflow, stride=1)
        else:
            cost = self._build_volume_2d(feats_l[scale], feats_r[scale],
                                         self.maxdisplist[scale], stride=1)

        cost = torch.unsqueeze(cost, 1)
        cost = self.volume_postprocess[scale](cost)
        cost = cost.squeeze(1)
        if scale == 0:
            pred_low_res = disparityregression2(0, self.maxdisplist[0])(F.softmax(-cost, dim=1))
            pred_low_res = pred_low_res * img_size[2] / pred_low_res.size(2)
            disp_up = F.upsample(pred_low_res, (img_size[2], img_size[3]), mode='bilinear')
            pred.append(disp_up)
        else:
            pred_low_res = disparityregression2(-self.maxdisplist[scale]+1, self.maxdisplist[scale], stride=1)(F.softmax(-cost, dim=1))
            pred_low_res = pred_low_res * img_size[2] / pred_low_res.size(2)
            disp_up = F.upsample(pred_low_res, (img_size[2], img_size[3]), mode='bilinear')
            pred.append(disp_up+pred[scale-1])


    if self.refine_spn:
        spn_out = self.refine_spn[0](nn.functional.upsample(left, (img_size[2]//4, img_size[3]//4), mode='bilinear'))
        G1, G2, G3 = spn_out[:,:self.spn_init_channels,:,:], spn_out[:,self.spn_init_channels:self.spn_init_channels*2,:,:], spn_out[:,self.spn_init_channels*2:,:,:]
        sum_abs = G1.abs() + G2.abs() + G3.abs()
        G1 = torch.div(G1, sum_abs + 1e-8)
        G2 = torch.div(G2, sum_abs + 1e-8)
        G3 = torch.div(G3, sum_abs + 1e-8)
        pred_flow = nn.functional.upsample(pred[-1], (img_size[2]//4, img_size[3]//4), mode='bilinear')
        refine_flow = self.spn_layer(self.refine_spn[1](pred_flow), G1, G2, G3)
        refine_flow = self.refine_spn[2](refine_flow)
        pred.append(nn.functional.upsample(refine_flow, (img_size[2] , img_size[3]), mode='bilinear'))

    return pred

You just need to return the result after extracting features from left and right as below:

def forward(self, left, right):

    img_size = left.size()

    feats_l = self.feature_extraction(left)
    feats_r = self.feature_extraction(right)
    return feats_l, feats_r

If no error occurs, you just move to the next instruction.

def forward(self, left, right):

    img_size = left.size()

    feats_l = self.feature_extraction(left)
    feats_r = self.feature_extraction(right)
    pred = []
    for scale in range(len(feats_l)):
        if scale > 0:
            wflow = F.upsample(pred[scale-1], (feats_l[scale].size(2), feats_l[scale].size(3)),
                               mode='bilinear') * feats_l[scale].size(2) / img_size[2]
            cost = self._build_volume_2d3(feats_l[scale], feats_r[scale],
                                     self.maxdisplist[scale], wflow, stride=1)
           return cost
        else:
            cost = self._build_volume_2d(feats_l[scale], feats_r[scale],
                                         self.maxdisplist[scale], stride=1)

At the end, you can contact me by my school email: [email protected] or join a qq group (1021964010) where we can talk about onnx and onnxsim.

Sep 15 '22 10:09 hebangwen