tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

onnx_tf.backend.prepare(model) in Tensorflow to ONNX tutorial error: "InvalidArgumentError: Dimensions must be equal"

Open ividal opened this issue 6 years ago • 7 comments

Python version: 3.5.2 onnx==1.2.1 onnx-tf==1.1.2 tensorflow-gpu==1.8.0 Using tutorial as of this commit.

Following the instructions in the tutorial, I've used this script to train. Worked smoothly. I froze the model using:

python3 /path/to/site-packages/tensorflow/python/tools/freeze_graph.py \
    --input_graph=/home/ividal/dev/onnx/tutorials/tutorials/graph.proto \
    --input_checkpoint=/home/ividal/dev/onnx/tutorials/tutorials/ckpt/model.ckpt \
    --output_graph=/tmp/frozen_graph.pb \
    --output_node_names=fc2/add \
    --input_binary=True

This produced the expected /tmp/frozen_graph.pb . The export code in the tutorial provides the expected mnist.onnx file.

model = onnx.load('mnist.onnx') works, but:

tf_rep = prepare(model) yields:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.venvs/onnx/lib/python3.5/site-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
   1566   try:
-> 1567     c_op = c_api.TF_FinishOperation(op_desc)
   1568   except errors.InvalidArgumentError as e:

InvalidArgumentError: Dimensions must be equal, but are 16 and 64 for 'Add_1' (op: 'Add') with input shapes: [?,64,?,16], [1,1,1,64].

From the error message, I gather the expected channels might be switched (?). However, I did not modify the tutorial code, so it shouldn't be that. Any ideas...?

Thanks!

ividal avatar May 29 '18 13:05 ividal

Just in case I repeated everything with Tensorflow 1.5.0, since it's the last version explicitly mentioned in the documentation, but the error is exactly the same.

[Edit] For the sake of completeness, I tried freezing the graph with bazel-built tool, as the original tutorial suggested. Same results.

bazel build tensorflow/python/tools:freeze_graph
bazel-bin/tensorflow/python/tools/freeze_graph \
    --input_graph=/home/ividal/dev/onnx/tutorials/tutorials/graph.proto \
    --input_checkpoint=/home/ividal/dev/onnx/tutorials/tutorials/ckpt/model.ckpt \
    --output_graph=/tmp/frozen_graph.pb \
    --output_node_names=fc2/add \
    --input_binary=True

ividal avatar May 29 '18 13:05 ividal

i meet the same problem with u, does anyone has some solution??

Revo-Future avatar May 30 '18 07:05 Revo-Future

I have a feeling it's this problem. NCHW vs NHWC at the different steps: training vs freezing vs exporting vs loading in onnx. Just don't know exactly where or how to fix it.

ividal avatar May 30 '18 13:05 ividal

@ividal Did you find a solution to this issue yet? Please let me know.

knandanan avatar Aug 09 '18 12:08 knandanan

Did anyone found the solution to the issue ?

mukul74 avatar Oct 15 '18 14:10 mukul74

Did anyone found the solution to the issue ?

AmitAilianiSDC avatar Dec 17 '18 21:12 AmitAilianiSDC

@knandanan No, sorry, opted to keep onnx and TF separate for this (I stick to TF and a .pb if deployment had to be with TF, e.g. an android device).

ividal avatar Dec 17 '18 23:12 ividal