CNTK icon indicating copy to clipboard operation
CNTK copied to clipboard

Possible bug in the ONNX exporter logic

Open adaber opened this issue 6 years ago • 4 comments

Hey everyone,

I have recently converted a CNTK 2.7 model to the ONNX format and tried to use it with the ONNXRuntime library but I have encountered an error. It seems that it might be a bug in the CNTK exporter logic.

You can see the conversation with an ONNXRuntime team member at the following link: https://github.com/microsoft/onnxruntime/issues/1133

Is there anything that I can do to solve this problem or is it a bug in the CNTK exporter logic ?

Thanks.

adaber avatar May 30 '19 20:05 adaber

These are the CNTK 2.4, CNTK 2.7 "CNTKv2" and the CNTK 2.7 "ONNX" models that caused the issue.

TestModels.zip

Thanks.

adaber avatar May 30 '19 20:05 adaber

Just removing Sequence will not be enough. I ended up using this to fix the exported model, you may need to change depending on the model you have.

onnx_model = onnx.load(onnxfile)
graph = onnx_model.graph

# Remove Sequence dimension
for tensor_info in [graph.value_info, graph.input, graph.output]:
    for value_info in tensor_info:
        dim = value_info.type.tensor_type.shape.dim
        if not len(dim):
            continue
        val = getattr(dim[0], "dim_param", None)
        if val == 'Sequence':
            dim.remove(dim[0])

# to let inference completely to ORT
#while len(graph.value_info):
#    graph.value_info.remove(graph.value_info[0])

# fix reshape
node = graph.node[-4]
shape_name = node.input[1]
g = [g for g in graph.initializer if g.name == shape_name][0]

# fix reshape
#graph.initializer.remove(g)
#new_g = onnx.helper.make_tensor(shape_name, onnx.TensorProto.INT64, [2], np.array([   1, 1024], dtype=int))
#graph.initializer.append(new_g)
g.raw_data = np.frombuffer(g.raw_data, dtype=int)[1:].tostring()
g.dims.pop()
g.dims.append(2)
g = [g for g in graph.input if g.name == shape_name][0]
g.type.tensor_type.shape.dim[0].dim_value = 2

# fix all concats
for g in [g for g in graph.node if g.op_type == 'Concat']:
    for a in [a for a in g.attribute if a.name == 'axis']:
        a.i = 1

# Fix average pooling
boz = []
for g in graph.node:
    for a in [a for a in g.attribute if a.name == 'pads']:
        if len(a.ints) != 4:
            boz.append(g)
assert(len(boz) == 1)
a.ints.pop()
a.ints.pop()

# fix softmax
n = graph.node[-1]
n.attribute[0].i = 1

onnx.save(onnx_model, onnxfile)

It cannot be generalized, for example I do not know why AveragePooling had wrong pads in my case! so your best bet is to fix model and try to run until it works then verify!

dashesy avatar Oct 25 '19 22:10 dashesy

As a side note, we appear to have resolved similar issues, where models were defined/build in C# but we failed to create input variables correctly.

Incorrect:

Variable.InputVariable(channelInputNDShape, f32, name)

Correct

Variable.InputVariable(channelInputNDShape, f32, name, 
    dynamicAxes: new[] { Axis.DefaultBatchAxis() })

This is then the same as defaults in python:

def input_variable(shape, dtype=default_override_or(np.float32), needs_gradient=False, is_sparse=False,
                   dynamic_axes=[Axis.default_batch_axis()], name=''):

without specifying DefaultBatchAxis() we got Sequence x 1 in shapes, which onnx runtime could not load.

nietras avatar Dec 04 '19 12:12 nietras

Is there any progress on this issue? Do you need more models to reproduce the problem?

The thing is, as CNTK is not going to be in active development, we are planning to move existing models to ORT for inference. But we keep getting issues with transition. I finally managed to export a model (which was a problem by itself, see #3791), but now I cannot load it because of the

Microsoft.ML.OnnxRuntime.OnnxRuntimeException:
[ErrorCode:Fail] Node (out.x.x.x.x.x.x.x.c) Op (Conv) [ShapeInferenceError] Attribute dilations has incorrect size

What kind of info could I provide to help resolving the problem?

mikhail-barg avatar Feb 22 '20 11:02 mikhail-barg