models icon indicating copy to clipboard operation
models copied to clipboard

[ONNXRuntimeError] Arcface model fails during ONNX runtime session run

Open tmoreau89 opened this issue 4 years ago • 11 comments

Symptom

Arcface model from onnx model zoo doesn't run properly in ONNX runtime session run (tested in 1.0.0).

The following error message gets printed out:

2019-11-20 17:24:29.510729 [E:onnxruntime:, sequential_executor.cc:165 Execute] Non-zero status code returned while running BatchNormalization node. Name:'stage1_unit1_bn1' Status Message: Invalid input scale: NumDimensions() != 3
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    ort_sess.run(None, ort_data)
  File "/Users/moreau/.pyenv/versions/3.7.0/lib/python3.7/site-packages/onnxruntime/capi/session.py", line 136, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'stage1_unit1_bn1' Status Message: Invalid input scale: NumDimensions() != 3

Reproducing the error

Environment:

  • ubuntu: 18.04
  • python: 3.7.0
  • onnx: 1.6.0
  • onnxruntime: 1.0.0
  • numpy: 1.17.4

Dockerfile to reproduce error:

FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y python3-dev python3-pip git vim
RUN pip3 install --upgrade pip

RUN pip3 install numpy==1.17.4
RUN pip3 install onnx==1.6.0
RUN pip3 install onnxruntime==1.0.0

Model:

  • URL: https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx
  • SHA: 66074b860f905295aab5a842be57f37d

Minimal python script:

import numpy as np
import urllib.request
import onnxruntime
import os

MODEL_FILE = "resnet100.onnx"
URL = "https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx"

# Download model file
if not os.path.exists(MODEL_FILE):
    urllib.request.urlretrieve(URL, MODEL_FILE)

# Create ONNX runtime session
ort_sess = onnxruntime.InferenceSession(MODEL_FILE)
# Print information on model input
for idx, inp in enumerate(ort_sess.get_inputs()):
    print("Input info: {}".format(inp))
# Init data
ort_data = {"data": np.random.rand(1,3,112,112).astype("float32")}
# Run
ort_sess.run(None, ort_data)

tmoreau89 avatar Nov 21 '19 01:11 tmoreau89

@abhinavs95 @jennifererwangg

tmoreau89 avatar Nov 21 '19 01:11 tmoreau89

Issue reference in OnnxRuntime: https://github.com/microsoft/onnxruntime/issues/2447

tmoreau89 avatar Nov 21 '19 01:11 tmoreau89

Please see my comment on Oct 10 here - https://github.com/onnx/models/issues/156

hariharans29 avatar Nov 21 '19 01:11 hariharans29

Thanks @hariharans29 ; @prasanthpul are there any plans in fixing this model so it can run out of the box without having to rely on the node attribute rewrite fix (referring to the fix proposed inn #2447 below)

import onnx

model = onnx.load(r'arcface_mxnet\resnet100.onnx')

for node in model.graph.node:
    if(node.op_type == "BatchNormalization"):
        for attr in node.attribute:
            if (attr.name == "spatial"):
                attr.i = 1
                
onnx.save(model, r'updated_resnet100.onnx')

tmoreau89 avatar Nov 21 '19 01:11 tmoreau89

This should be fixed on the MXNet side and the model re-exported.

Alternatively, do you know a non-MXNet version of this model we can replace with?

prasanthpul avatar Nov 21 '19 17:11 prasanthpul

@prasanthpul I can update the model but at this point it's hosted on an amazon bucket https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx which I'm not sure I have write permission to in order to update.

tmoreau89 avatar Nov 21 '19 17:11 tmoreau89

perhaps @abhinavs95 can provide pointers on how to update the model in the bucket

tmoreau89 avatar Nov 21 '19 17:11 tmoreau89

@tmoreau89 we actually are in the process of moving models to Git LFS and have done this for new models. So you can simply ignore the model in the bucket and upload new versions into Git LFS. (See the BERT model for example).

Ideally the fix has already been done on the MXnet side, and this is just re-exporting using a standard MXnet build. However, if this is going to be a manually corrected model, then you'll also need to update the readme to indicate there is a bug in mxnet and show how the provided model is actually generated.

prasanthpul avatar Nov 24 '19 04:11 prasanthpul

Thanks @prasanthpul ; I'll submit the PR when I have the bandwidth sometime this coming week

tmoreau89 avatar Nov 25 '19 05:11 tmoreau89

Any updates on this? I'm just experiencing the same issue.

georg-jung avatar Aug 04 '21 16:08 georg-jung

Thanks @hariharans29 ; @prasanthpul are there any plans in fixing this model so it can run out of the box without having to rely on the node attribute rewrite fix (referring to the fix proposed inn #2447 below)

import onnx

model = onnx.load(r'arcface_mxnet\resnet100.onnx')

for node in model.graph.node:
    if(node.op_type == "BatchNormalization"):
        for attr in node.attribute:
            if (attr.name == "spatial"):
                attr.i = 1
                
onnx.save(model, r'updated_resnet100.onnx')

I tried this but didn't work for me. Still got this:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_17' Status Message: Invalid input scale: NumDimensions() != 1

SystemErrorWang avatar Feb 08 '22 09:02 SystemErrorWang