mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

convert from mxnet to onnx failed, with Unrecognized attribute: spatial for operator BatchNormalization

Open dongjunjundong opened this issue 6 years ago • 17 comments

Description

Failed to convert pretrained mode cifar_resnet110_v1 from mxnet format to onnx;

Environment info (Required)

using the mxnet zoo pre trained module cifar_resnet110_v1 mxnet version 1.5.0 the convert call is below: input_shape = [(1,3,32,32)] export_model("simple_net-symbol.json","simple_net-0000.params",input_shape)

Error Message:

(Paste the complete error message, including stack trace.)

~/ENV/lib/python3.5/site-packages/onnx/checker.py in checker(proto, ctx) 50 proto_type.name)) 51 return getattr(C, py_func.name)( ---> 52 proto.SerializeToString(), ctx) 53 return cast(FuncType, checker) 54 return decorator

ValidationError: Unrecognized attribute: spatial for operator BatchNormalization

==> Context: Bad node spec: input: "cifarresnetv10_conv0_fwd" input: "cifarresnetv10_batchnorm0_gamma" input: "cifarresnetv10_batchnorm0_beta" input: "cifarresnetv10_batchnorm0_running_mean" input: "cifarresnetv10_batchnorm0_running_var" output: "cifarresnetv10_batchnorm0_fwd" name: "cifarresnetv10_batchnorm0_fwd" op_type: "BatchNormalization" attribute { name: "epsilon" f: 1e-05 type: FLOAT } attribute { name: "momentum" f: 0.9 type: FLOAT } attribute { name: "spatial" i: 0 type: INT }

dongjunjundong avatar Apr 02 '19 03:04 dongjunjundong

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: ONNX, Bug

mxnet-label-bot avatar Apr 02 '19 03:04 mxnet-label-bot

onnx version 1.4.1

dongjunjundong avatar Apr 02 '19 04:04 dongjunjundong

@dongjunjundong MXNet has support up to ONNX v1.3.0. BatchNormalization (Opset7) had a an attribute "spatial" which is being exported from MXNet to ONNX. Looks like this attribute has been dropped in BatchNormalization (Opset 9).

Please try using ONNX v1.3.0. Exporting with ONNX 1.3.0 worked for me.

vandanavk avatar Apr 02 '19 15:04 vandanavk

Maybe mxnet should target a specific opset then. Or at the very least have a check in there that errors out with a sane error message when onnx is too new...

mika-fischer avatar Apr 16 '19 14:04 mika-fischer

Same problem, you can remove the 'spatial' attribute of batchnorm from the python file (output of error info)!

TriLoo avatar Apr 23 '19 05:04 TriLoo

I have convert mxnet(1.5.0) to the onnx(1.5.0), the error is:

`INFO:root:Converting idx: 3, op: null, name: first-3x3-conv-batchnorm_gamma INFO:root:Converting idx: 4, op: null, name: first-3x3-conv-batchnorm_beta INFO:root:Converting idx: 5, op: null, name: first-3x3-conv-batchnorm_moving_mean Traceback (most recent call last): File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in main() File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in tune_and_evaluate(tuning_option) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate net, params, input_shape, _ = get_network(network, batch_size=1) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network return get_network_lpr_mb2(name,batch_size) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2 test_onnx() File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model verbose=verbose) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 234, in create_onnx_graph_proto in_shape=in_shape[graph_input_idx], IndexError: list index out of range Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in import apt File "/usr/lib/python3/dist-packages/apt/init.py", line 23, in import apt_pkg ModuleNotFoundError: No module named 'apt_pkg'

Original exception was: Traceback (most recent call last): File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1741, in main() File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/home/deep/workssd/work/pycharm-community-2019.1.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 484, in tune_and_evaluate(tuning_option) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 436, in tune_and_evaluate net, params, input_shape, _ = get_network(network, batch_size=1) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 93, in get_network return get_network_lpr_mb2(name,batch_size) File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 143, in get_network_lpr_mb2 test_onnx() File "/home/deep/workssd/arm/tvm_app/tune_relay_mobile_gpu.py", line 135, in test_onnx converted_model_path = onnx_mxnet.export_model(mx_sym, args, [input_shape], np.float32, onnx_file, True) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model verbose=verbose) File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 234, in create_onnx_graph_proto in_shape=in_shape[graph_input_idx], IndexError: list index out of range`

I tracked the code and  found the 'batchnorm_moving_mean' failed because it is not in saved params. so the in_shape[graph_input_idx] is out of bound. 
Anyone has suggestion?

@vandanavk

nopattern avatar Jul 07 '19 06:07 nopattern

the moving_mean is in the auxs param. just merge args and auxs.

nopattern avatar Jul 08 '19 12:07 nopattern

@mxnet-label-bot My onnx 1.5.0 (latest) For batchnorm, I revised script mxnet/contrib/mx2onnx/_op_translations.py as follows: 1, on line 647: kernel = eval(attrs["kernel"]) if attrs.get("kernel") else None This is needed for global pooling like: x = mx.symbol.Pooling(data=data, pool_type='avg', global_pool=True, name=name+'pool')

2, delete line 359: spatial=0 This line is not supported for onnx > 1.3.0

It works for me tranlate .params and json to onnx, but this failed me for inference in tensorRT 5.1.5. It should be another problem though I'm not sure about that.

Lebron8997 avatar Aug 10 '19 05:08 Lebron8997

@mxnet-label-bot My onnx 1.5.0 (latest) For batchnorm, I revised script mxnet/contrib/mx2onnx/_op_translations.py as follows: 1, on line 647: kernel = eval(attrs["kernel"]) if attrs.get("kernel") else None This is needed for global pooling like: x = mx.symbol.Pooling(data=data, pool_type='avg', global_pool=True, name=name+'pool')

2, delete line 359: spatial=0 This line is not supported for onnx > 1.3.0

It works for me tranlate .params and json to onnx, but this failed me for inference in tensorRT 5.1.5. It should be another problem though I'm not sure about that.

Did you run into this ? Have you solved it?

arsdragonfly avatar Aug 22 '19 06:08 arsdragonfly

The model-symbol.json file I had did not have spatial attribute. Still it failed. I noticed that the mxnet-onnx converter was adding that attribute as below: Simply comment the line 359 in the file /Library/Python/3.7/site-packages/mxnet/contrib/onnx/mx2onnx/_op_translations.py on MacOSX or /usr/local/lib/python3.6/dist-packages/mxnet/contrib/onnx/mx2onnx/_op_translations.py on Linux re-run your mxnet2onnx converter code.

klonikar avatar Mar 25 '20 06:03 klonikar

@dongjunjundong MXNet has support up to ONNX v1.3.0. BatchNormalization (Opset7) had a an attribute "spatial" which is being exported from MXNet to ONNX. Looks like this attribute has been dropped in BatchNormalization (Opset 9).

Please try using ONNX v1.3.0. Exporting with ONNX 1.3.0 worked for me.

Can you please tell me how to export with onnx 1.3 since we are using onnx from mxnet ?

AIGirl10 avatar Aug 19 '20 03:08 AIGirl10

cc @josephevans who's working on ONNX improvements.

szha avatar Sep 14 '20 16:09 szha

the latest mxnet 1.7.0 haven't sovled the issue yet.

Brightchu avatar Mar 15 '21 06:03 Brightchu

Hi @Brightchu, we have been working on ONNX support on the v1.x branch lately. We added support for onnx 1.7 and 100s of new models. Would you share your specific export use case so that we can prioritize it? BTW to get your current mxnet's (1.5. 1.6 1.7 etc) onnx support up to date, you can check out this tool: https://github.com/apache/incubator-mxnet/pull/19876

Zha0q1 avatar Mar 15 '21 06:03 Zha0q1

Hi @Brightchu, we have been working on ONNX support on the v1.x branch lately. We added support for onnx 1.7 and 100s of new models. Would you share your specific export use case so that we can prioritize it? BTW to get your current mxnet's (1.5. 1.6 1.7 etc) onnx support up to date, you can check out this tool: #19876

hi, thanks for your reply. i am converting a SSR net from mxnet to onnx. the mxnet version is 1.7.0, onnx version is 1.8.1; it failed, when using onnx_mxnet.export_model() to converting the model. i checkout the onnx_mxnet.export_model() func, in the last step, onnx.checker assert an error, indicating the BatchNormalization layers containing the attribute "spatial" which it is not compatible with onnx. according to @vandanavk 's suggestion, downgradinng onnx to 1.3.0 works for me. and i test the onnxruntime.run inference result, it's consistent with mxnet output. so, the problem is partially resolved as in my situation..

Brightchu avatar Mar 15 '21 06:03 Brightchu

@Brightchu Thanks for the message. I checked onnx's op doc and batchnorm used to have an attribute called spatial. This attribute is no longer there in the new op set/newer onnx. We have noted this down and we will bring our batchnorm conversion up to date to the current onnx specification.

Our work-in-progress onnx 1.7 support has covered many more cv and nlp models (and more modern models). We are going to publish docs/blog posts very soon. If you are interested in trying that out please let use know how we can help or dubug your use case

Zha0q1 avatar Mar 15 '21 06:03 Zha0q1

I use “pip install onnx==1.3.0”,and it solved

guolele1990 avatar May 19 '23 09:05 guolele1990