vision Torchvision Faster R-CNN onnx export with dynamic batch size fails during inference

🐛 Describe the bug

from torchvision import models

frcnn = models.detection.fasterrcnn_resnet50_fpn_v2(pretrained=True)

import io

import torch

x = torch.rand(4, 3, 224, 224)
with io.BytesIO() as f:
    torch.onnx.export(
        frcnn,
        x,
        f,
        export_params=True,
        opset_version=20,
        do_constant_folding=True,
        keep_initializers_as_inputs=None,
        custom_opsets={"moka": 20},
        input_names=["images"],
        output_names=["output"],
        dynamic_axes={
            "images": {0: "batch_size", 2: "height", 3: "width"},
            "output": {0: "batch_size"},
        },
        dynamo=False,
    )
    onnx_model = f.getvalue()

import onnxruntime as ort

providers = ["CUDAExecutionProvider"] if torch.cuda.is_available() else ["CPUExecutionProvider"]
# use different batch size from x
ort_session = ort.InferenceSession(onnx_model , providers=providers)

ort_inputs = {
    ort_session.get_inputs()[0].name: torch.rand(2,3,448,224,).detach().numpy(),
}
ort_outputs = ort_session.run(None, ort_inputs)

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Split node. Name:'/Split' Status Message: Cannot split using values in 'split' attribute. Axis=0 Input shape={2,3,448,224} NumOutputs=4 Num entries in 'split' (must equal number of outputs) was 4 Sum of sizes in 'split' (must equal size of selected axis) was 4

Above is a minimal example that fails. When images with same batch size as sample input are used at inference, it does not fail. What causes the error?

Versions

PyTorch version: 2.6.0+cu126 Is debug build: False CUDA used to build PyTorch: 12.6 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro (10.0.22631 64비트) GCC version: Could not collect Clang version: Could not collect CMake version: version 3.31.5 Libc version: N/A

Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: True CUDA runtime version: 12.8.61 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4080 Nvidia driver version: 571.96 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Name: 13th Gen Intel(R) Core(TM) i7-13700KF Manufacturer: GenuineIntel Family: 198 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 3400 MaxClockSpeed: 3400 L2CacheSize: 24576 L2CacheSpeed: None Revision: None

Versions of relevant libraries: [pip3] numpy==2.2.2 [pip3] onnx==1.17.0 [pip3] onnxruntime-gpu==1.20.1 [pip3] onnxscript==0.2.0 [pip3] onnxsim==0.4.36 [pip3] pytorch-lightning==2.5.0.post0 [pip3] torch==2.6.0+cu126 [pip3] torchmetrics==1.6.1 [pip3] torchvision==0.21.0+cu126 [conda] Could not collect

Feb 25 '25 08:02 davidgill97

Hi @davidgill97 , sorry, I don't think I'll be able to prioritize ONNX-related issues from now.

Feb 25 '25 10:02 NicolasHug

Hi @davidgill97 , sorry, I don't think I'll be able to prioritize ONNX-related issues from now.

I see. I'm willing to look into the problem, could you give me some advice?

Feb 26 '25 01:02 davidgill97