tensorflow-onnx
tensorflow-onnx copied to clipboard
Cannot export Keras TextVectorization to ONNX
Describe the bug A simple model with a TextVectorization layer cannot be converted to ONNX
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
Tf2onnx version: 1.9.3
Tensorflow Version: 2.8.0
Python version: 3.7.11
To Reproduce
import tensorflow as tf
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
text_dataset = tf.data.Dataset.from_tensor_slices(corpus)
vectorize_layer = tf.keras.layers.TextVectorization()
vectorize_layer.adapt(text_dataset)
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(1,), dtype=tf.string))
model.add(vectorize_layer)
input_data = [["This document is boring"], ["The first document is this"]]
model.predict(input_data)
import tf2onnx
spec = [tf.TensorSpec(shape=[None, 1], dtype=tf.string, name="input")]
extra_opset = [tf2onnx.utils.make_opsetid(tf2onnx.constants.CONTRIB_OPS_DOMAIN, 1)]
model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model,
input_signature=spec, opset=13, custom_ops=None,
custom_op_handlers=None, custom_rewriter=None,
inputs_as_nchw=None, extra_opset=extra_opset, shape_override=None,
target=None, large_model=False, output_path='test.onnx')
output_names = [n.name for n in model_proto.graph.output]
import onnxruntime as rt
from onnxruntime_extensions import get_library_path
so = rt.SessionOptions()
so.register_custom_ops_library(get_library_path())
providers = ['CPUExecutionProvider']
m = rt.InferenceSession('test.onnx', so, providers=providers)
onnx_pred = m.run(output_names, {"input": input_data})
print(onnx_pred)
Running the above snippet produces the following output:
Cannot infer shape for sequential/text_vectorization/string_lookup/None_Lookup/LookupTableFindV2: sequential/text_vectorization/string_lookup/None_Lookup/LookupTableFindV2:0
Cannot infer shape for sequential/text_vectorization/string_lookup/SelectV2: sequential/text_vectorization/string_lookup/SelectV2:0
Cannot infer shape for sequential/text_vectorization/string_lookup/Identity: sequential/text_vectorization/string_lookup/Identity:0
ONNX Failed to infer shapes and dtypes for [Compress__95, type: Compress]
Traceback (most recent call last):
File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/schemas.py", line 154, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto, strict_mode=True)
File "/home/lucas/miniconda3/lib/python3.9/site-packages/onnx/shape_inference.py", line 41, in infer_shapes
inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode, data_prop)
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Shape inference error(s): (op_type:Compress, node name: Compress__95): [ShapeInferenceError] Indices tensor must have rank >= 1
And the ONNX prediction is wrong:
[array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1]], dtype=int64)]
Expected behavior The model should convert without error, and the prediction should be the same as tensorflow:
array([[2, 5, 4, 1, 0],
[3, 6, 5, 4, 2]])
An error from RaggedTensorToTensor
conversion. https://github.com/onnx/tensorflow-onnx/blob/6003d4c012cb9ce5ac87e40ca4e50d974ca80329/tf2onnx/onnx_opset/tensor.py#L2568
Need to do more investigation to get how it works.
Thanks for your reply Huang.
Based on your comment I tried to skip the RaggedTensorToTensor conversion by adding the "ragged=True" parameter to the TextVectorization constructor, however I now get a different issue when calling tf2onnx.convert.from_keras:
Traceback (most recent call last):
File "/home/lucas/asco/asco.py", line 42, in <module>
model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model,
File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 466, in from_keras
tensors_to_rename = tensor_names_from_structed(concrete_func, input_names, output_names)
File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 310, in tensor_names_from_structed
tensors_to_rename[v.name] = k
AttributeError: 'RaggedTensor' object has no attribute 'name'
Any idea what might be causing this?
Thanks for your reply Huang.
Based on your comment I tried to skip the RaggedTensorToTensor conversion by adding the "ragged=True" parameter to the TextVectorization constructor, however I now get a different issue when calling tf2onnx.convert.from_keras:
Traceback (most recent call last): File "/home/lucas/asco/asco.py", line 42, in <module> model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model, File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 466, in from_keras tensors_to_rename = tensor_names_from_structed(concrete_func, input_names, output_names) File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 310, in tensor_names_from_structed tensors_to_rename[v.name] = k AttributeError: 'RaggedTensor' object has no attribute 'name'
Any idea what might be causing this?
RaggedTensor can not get the name as attribute, if you can repalce line307-308 to below code,
outputs = [tensor.name for tensor in concrete_func.outputs if tensor.dtype != tf.dtypes.resource]
structured_outputs = sorted(concrete_func.structured_outputs.keys())
tensors_to_rename.update(zip(outputs, structured_outputs))
Then the code will work. However, it will arise other shape related issue...
Thanks for your reply Huang. Based on your comment I tried to skip the RaggedTensorToTensor conversion by adding the "ragged=True" parameter to the TextVectorization constructor, however I now get a different issue when calling tf2onnx.convert.from_keras:
Traceback (most recent call last): File "/home/lucas/asco/asco.py", line 42, in <module> model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model, File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 466, in from_keras tensors_to_rename = tensor_names_from_structed(concrete_func, input_names, output_names) File "/home/lucas/miniconda3/lib/python3.9/site-packages/tf2onnx/convert.py", line 310, in tensor_names_from_structed tensors_to_rename[v.name] = k AttributeError: 'RaggedTensor' object has no attribute 'name'
Any idea what might be causing this?
RaggedTensor can not get the name as attribute, if you can repalce line307-308 to below code,
outputs = [tensor.name for tensor in concrete_func.outputs if tensor.dtype != tf.dtypes.resource] structured_outputs = sorted(concrete_func.structured_outputs.keys()) tensors_to_rename.update(zip(outputs, structured_outputs))
Then the code will work. However, it will arise other shape related issue...
Indeed, I applied your proposed fix but still get a wrong model:
Cannot infer shape for sequential/text_vectorization/string_lookup/None_Lookup/LookupTableFindV2: sequential/text_vectorization/string_lookup/None_Lookup/LookupTableFindV2:0
Cannot infer shape for sequential/text_vectorization/string_lookup/SelectV2: sequential/text_vectorization/string_lookup/SelectV2:0
Cannot infer shape for sequential/text_vectorization/string_lookup/Identity: sequential/text_vectorization/string_lookup/Identity:0
Cannot infer shape for Identity: Identity:0
2022-03-25 18:05:46.556352100 [W:onnxruntime:, execution_frame.cc:811 VerifyOutputSizes] Expected shape from model of {} does not match actual shape of {49} for output text_vectorization
[array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1], dtype=int64)]
Below is the generated model:
And the ONNX prediction is wrong:
[array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int64)]
Expected behavior The model should convert without error, and the prediction should be the same as tensorflow:
array([[2, 5, 4, 1, 0], [3, 6, 5, 4, 2]])
Actually I found that the predictions are wrong because of a faulty implementation of the StringSplit operator in onnxruntime-extensions: https://github.com/microsoft/onnxruntime-extensions/issues/216
This means that the warning and errors about shape inference can be ignored