Conversion error for a model with the `transpose` operation
🐞Describing the bug
When converting the model containing the transpose operation, an error may occur (ValueError: axes don't match array) - resulting in the conversion not being performed.
Stack Trace
Click to expand complete stack trace!
Traceback (most recent call last):
File "test_coreml_2.py", line 18, in <module>
mlmodel = ct.convert(model, inputs=[ct.TensorType(shape=(1, 1, 2))])
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/_converters_entry.py", line 363, in convert
debug=debug,
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/converter.py", line 183, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/converter.py", line 215, in _mil_convert
**kwargs
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/converter.py", line 273, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/converter.py", line 95, in __call__
return tf2_loader.load()
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow/load.py", line 84, in load
program = self._program_from_tf_ssa()
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow2/load.py", line 200, in _program_from_tf_ssa
return converter.convert()
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 401, in convert
self.convert_main_graph(prog, graph)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 330, in convert_main_graph
outputs = convert_graph(self.context, graph, self.outputs)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow/convert_utils.py", line 189, in convert_graph
add_op(context, node)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/frontend/tensorflow/ops.py", line 1842, in Transpose
x = mb.transpose(x=x, perm=perm, name=node.name)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 63, in add_op
return cls._add_op(op_cls, **kwargs)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/builder.py", line 191, in _add_op
new_op.type_value_inference()
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/operation.py", line 243, in type_value_inference
output_vals = self._auto_val(output_types)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/operation.py", line 330, in _auto_val
vals = self.value_inference()
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/operation.py", line 109, in wrapper
return func(self)
File "/coremltools_venv/lib/python3.7/site-packages/coremltools/converters/mil/mil/ops/defs/tensor_transformation.py", line 886, in value_inference
return np.transpose(self.x.val, axes=self.perm.val)
File "<__array_function__ internals>", line 6, in transpose
File "/coremltools_venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 660, in transpose
return _wrapfunc(a, 'transpose', axes)
File "/coremltools_venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/coremltools_venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: axes don't match array
To Reproduce
- Sample code with the minimal model causing this error:
import tensorflow as tf
import coremltools as ct
class CustomTranspose(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomTranspose, self).__init__(**kwargs)
def call(self, inputs):
# inputs shape should be: (B,1,2)
mat_concat = tf.concat([inputs, inputs], axis=1) # [B,2,2]
mat_trans = tf.transpose(mat_concat, perm=[0, 2, 1]) # [B,2,2]
return mat_trans
inputs = tf.keras.Input(shape=(1, 2))
outputs = CustomTranspose()(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
mlmodel = ct.convert(model, inputs=[ct.TensorType(shape=(1, 1, 2))])
System environment (please complete the following information):
- coremltools version: 5.2, 6.0b1 (all versions are affected)
- OS: Tested on Ubuntu 18 (but all platforms are affected)
- Any other relevant version information: Tested on Python 3.7 and TensorFlow 2.6/2.9 (but all platforms are affected)
Additional context
- Debugging results
I debugged the code and I think I found a problem.
The problem arises from the implementation of the value_inference function:
@precondition(allow=VALUE | SYMBOL)
def value_inference(self):
return np.transpose(self.x.val, axes=self.perm.val)
For transpose operation, it has the decorator @precondition(allow=VALUE | SYMBOL), but in the code it refers directly to self.x.val - without any check for None/symbolic value.
As a result, in the case where the input is symbolic - np.transpose is called on None – which causes the error described in this issue.
- Potential fix
I am not very familiar with the coremltools implementation, but it seems to me that the correct implementation should look like:
@precondition(allow=VALUE | SYMBOL)
def value_inference(self):
if self.perm.val is None:
# only allow x to be symbolic. perm cannot.
return None
return np.transpose(self.x.sym_val, axes=self.perm.val)
Another option to get rid of this error might be to remove the allow=SYMBOL from the decorator:
@precondition(allow=VALUE)
def value_inference(self):
return np.transpose(self.x.val, axes=self.perm.val)
Or possibly adding a check for None:
@precondition(allow=VALUE | SYMBOL)
def value_inference(self):
if self.x.val is None:
return None
if self.perm.val is None:
return None
return np.transpose(self.x.val, axes=self.perm.val)
If any of the maintaners agree with my interpretation and with one of the solutions above - I can prepare a PR with this change (but I'd like to discuss the topic first and make sure my solution is correct).
- Other operations affected by a similar problem?
I took a look at the value_inference implementations of other operations - and it seems to me that a similar problem can also exist with the flatten2d operation. It also has allow=SYMBOL in the precondition decorator and in the implementation it refers directly to self.x.val. I think it's worth checking out.
Thanks for the detailed write up. I agree this is a problem.
Do any of your proposed solutions allow you to not only convert the model but also get predictions from mlmodel?
Sure, of course. I came across this error by converting a larger model with an image on the input. What is in the "To reproduce" section is just a minimal example that causes the same problem.
After applying any of the fixes that I proposed - my model converts and works. The predictions from the generated mlmodel correspond to the predictions from the model in tensorflow before conversion.
In fact, the value_inference method is optional (according to the documentation: Optional Python implementation of the op) and has no direct effect on the generated mlmodel. An implementation bug caused the conversion to crash in specific cases (when input val is symbolic).
A solution with return None or with allow = VALUE only is essentially the same as not implementing this optional method for symbolic input val. That's why I like the first solution that will return the correct result also for symbolic input.
Hi @andrusza2 - I have discussed this issue with my team.
We think the best fix is to change:
@precondition(allow=VALUE | SYMBOL)
def value_inference(self):
To:
@precondition(allow=VALUE)
def value_inference(self):
Could you put up a pull request for that change? Also, please add your reproduction example as a unit test.
Regarding this potentially also being an issue with flatten2d: I think it should be fine. I think we should always know the shape and axis values. In which case, it shouldn't be and issue.
Hi @TobyRoseman , I made a PR (#1563) with this change, take a look please.
Just out of curiosity - why is the version with disallowing symbolic values preferred?
When it comes to flatten2d - shape and axis shouldn't be a problem. But what worries me is referencing directly self.x.val in the return statement:
return self.x.val.reshape(dim_pre_axis, dim_post_axis)
It still seems to me that theoretically the same problem with symbolic value could occur as with the transpose (self.x.value being None in that case). Theoretically - because I think faltten2d is not used for any conversion at the moment (therefore I cannot give a working example of the problem). Please take another look if you can 🙂
Fixed by #1563.