models
models copied to clipboard
[ONNXRuntimeError] Arcface model fails during ONNX runtime session run
Symptom
Arcface model from onnx model zoo doesn't run properly in ONNX runtime session run (tested in 1.0.0).
The following error message gets printed out:
2019-11-20 17:24:29.510729 [E:onnxruntime:, sequential_executor.cc:165 Execute] Non-zero status code returned while running BatchNormalization node. Name:'stage1_unit1_bn1' Status Message: Invalid input scale: NumDimensions() != 3
Traceback (most recent call last):
File "test.py", line 21, in <module>
ort_sess.run(None, ort_data)
File "/Users/moreau/.pyenv/versions/3.7.0/lib/python3.7/site-packages/onnxruntime/capi/session.py", line 136, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'stage1_unit1_bn1' Status Message: Invalid input scale: NumDimensions() != 3
Reproducing the error
Environment:
- ubuntu: 18.04
- python: 3.7.0
- onnx: 1.6.0
- onnxruntime: 1.0.0
- numpy: 1.17.4
Dockerfile to reproduce error:
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y python3-dev python3-pip git vim
RUN pip3 install --upgrade pip
RUN pip3 install numpy==1.17.4
RUN pip3 install onnx==1.6.0
RUN pip3 install onnxruntime==1.0.0
Model:
- URL: https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx
- SHA: 66074b860f905295aab5a842be57f37d
Minimal python script:
import numpy as np
import urllib.request
import onnxruntime
import os
MODEL_FILE = "resnet100.onnx"
URL = "https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx"
# Download model file
if not os.path.exists(MODEL_FILE):
urllib.request.urlretrieve(URL, MODEL_FILE)
# Create ONNX runtime session
ort_sess = onnxruntime.InferenceSession(MODEL_FILE)
# Print information on model input
for idx, inp in enumerate(ort_sess.get_inputs()):
print("Input info: {}".format(inp))
# Init data
ort_data = {"data": np.random.rand(1,3,112,112).astype("float32")}
# Run
ort_sess.run(None, ort_data)
@abhinavs95 @jennifererwangg
Issue reference in OnnxRuntime: https://github.com/microsoft/onnxruntime/issues/2447
Please see my comment on Oct 10 here - https://github.com/onnx/models/issues/156
Thanks @hariharans29 ; @prasanthpul are there any plans in fixing this model so it can run out of the box without having to rely on the node attribute rewrite fix (referring to the fix proposed inn #2447 below)
import onnx
model = onnx.load(r'arcface_mxnet\resnet100.onnx')
for node in model.graph.node:
if(node.op_type == "BatchNormalization"):
for attr in node.attribute:
if (attr.name == "spatial"):
attr.i = 1
onnx.save(model, r'updated_resnet100.onnx')
This should be fixed on the MXNet side and the model re-exported.
Alternatively, do you know a non-MXNet version of this model we can replace with?
@prasanthpul I can update the model but at this point it's hosted on an amazon bucket https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx which I'm not sure I have write permission to in order to update.
perhaps @abhinavs95 can provide pointers on how to update the model in the bucket
@tmoreau89 we actually are in the process of moving models to Git LFS and have done this for new models. So you can simply ignore the model in the bucket and upload new versions into Git LFS. (See the BERT model for example).
Ideally the fix has already been done on the MXnet side, and this is just re-exporting using a standard MXnet build. However, if this is going to be a manually corrected model, then you'll also need to update the readme to indicate there is a bug in mxnet and show how the provided model is actually generated.
Thanks @prasanthpul ; I'll submit the PR when I have the bandwidth sometime this coming week
Any updates on this? I'm just experiencing the same issue.
Thanks @hariharans29 ; @prasanthpul are there any plans in fixing this model so it can run out of the box without having to rely on the node attribute rewrite fix (referring to the fix proposed inn #2447 below)
import onnx model = onnx.load(r'arcface_mxnet\resnet100.onnx') for node in model.graph.node: if(node.op_type == "BatchNormalization"): for attr in node.attribute: if (attr.name == "spatial"): attr.i = 1 onnx.save(model, r'updated_resnet100.onnx')
I tried this but didn't work for me. Still got this:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_17' Status Message: Invalid input scale: NumDimensions() != 1