Possible bug in the ONNX exporter logic
Hey everyone,
I have recently converted a CNTK 2.7 model to the ONNX format and tried to use it with the ONNXRuntime library but I have encountered an error. It seems that it might be a bug in the CNTK exporter logic.
You can see the conversation with an ONNXRuntime team member at the following link: https://github.com/microsoft/onnxruntime/issues/1133
Is there anything that I can do to solve this problem or is it a bug in the CNTK exporter logic ?
Thanks.
These are the CNTK 2.4, CNTK 2.7 "CNTKv2" and the CNTK 2.7 "ONNX" models that caused the issue.
Thanks.
Just removing Sequence will not be enough. I ended up using this to fix the exported model, you may need to change depending on the model you have.
onnx_model = onnx.load(onnxfile)
graph = onnx_model.graph
# Remove Sequence dimension
for tensor_info in [graph.value_info, graph.input, graph.output]:
for value_info in tensor_info:
dim = value_info.type.tensor_type.shape.dim
if not len(dim):
continue
val = getattr(dim[0], "dim_param", None)
if val == 'Sequence':
dim.remove(dim[0])
# to let inference completely to ORT
#while len(graph.value_info):
# graph.value_info.remove(graph.value_info[0])
# fix reshape
node = graph.node[-4]
shape_name = node.input[1]
g = [g for g in graph.initializer if g.name == shape_name][0]
# fix reshape
#graph.initializer.remove(g)
#new_g = onnx.helper.make_tensor(shape_name, onnx.TensorProto.INT64, [2], np.array([ 1, 1024], dtype=int))
#graph.initializer.append(new_g)
g.raw_data = np.frombuffer(g.raw_data, dtype=int)[1:].tostring()
g.dims.pop()
g.dims.append(2)
g = [g for g in graph.input if g.name == shape_name][0]
g.type.tensor_type.shape.dim[0].dim_value = 2
# fix all concats
for g in [g for g in graph.node if g.op_type == 'Concat']:
for a in [a for a in g.attribute if a.name == 'axis']:
a.i = 1
# Fix average pooling
boz = []
for g in graph.node:
for a in [a for a in g.attribute if a.name == 'pads']:
if len(a.ints) != 4:
boz.append(g)
assert(len(boz) == 1)
a.ints.pop()
a.ints.pop()
# fix softmax
n = graph.node[-1]
n.attribute[0].i = 1
onnx.save(onnx_model, onnxfile)
It cannot be generalized, for example I do not know why AveragePooling had wrong pads in my case! so your best bet is to fix model and try to run until it works then verify!
As a side note, we appear to have resolved similar issues, where models were defined/build in C# but we failed to create input variables correctly.
Incorrect:
Variable.InputVariable(channelInputNDShape, f32, name)
Correct
Variable.InputVariable(channelInputNDShape, f32, name,
dynamicAxes: new[] { Axis.DefaultBatchAxis() })
This is then the same as defaults in python:
def input_variable(shape, dtype=default_override_or(np.float32), needs_gradient=False, is_sparse=False,
dynamic_axes=[Axis.default_batch_axis()], name=''):
without specifying DefaultBatchAxis() we got Sequence x 1 in shapes, which onnx runtime could not load.
Is there any progress on this issue? Do you need more models to reproduce the problem?
The thing is, as CNTK is not going to be in active development, we are planning to move existing models to ORT for inference. But we keep getting issues with transition. I finally managed to export a model (which was a problem by itself, see #3791), but now I cannot load it because of the
Microsoft.ML.OnnxRuntime.OnnxRuntimeException:
[ErrorCode:Fail] Node (out.x.x.x.x.x.x.x.c) Op (Conv) [ShapeInferenceError] Attribute dilations has incorrect size
What kind of info could I provide to help resolving the problem?