tensorflow-onnx UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128)

Describe the bug I'm seeing a UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128). The code in tf_utils.py (https://github.com/onnx/tensorflow-onnx/blob/main/tf2onnx/tf_utils.py#L57) seems to mark this as expected, but the fallback to np.vectorize(lambda x: x.decode('UTF-8')) also seems to fail with a similar error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1024: invalid start byte

Urgency N/A

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): GCP VM
TensorFlow Version: 2.9.00
Python version: 3.7.12
ONNX version (if applicable, e.g. 1.11*): onnx-1.14.1 (installed via pip install git+https://github.com/onnx/tensorflow-onnx)
ONNXRuntime version (if applicable, e.g. 1.11*): onnxruntime-1.14.1

To Reproduce The model is a custom DCN v2 model built using libraries from the tensorflow ecosystem. This includes TFRS (recommender systems), TFR (ranking), TF Text, TF IO, and TF Transform. The model is saved using tf.saved_model.save(..).

Screenshots Screen Shot 2023-10-25 at 2 00 33 PM

Additional context

I found that a whole set of ops in the model don't seem to be present in the supported list of ops. But based on the troubleshooting guide, the error I'm seeing here looks different from the one mentioned in the guide. Is it possible that the decode errors are due to the unsupported ops?

Missing ops:
- Bucketize
- AssignVariableOp
- InitializeTableFromTextFileV2
- LookupTableImportV2
- MergeV2Checkpoints
- ReadVariableOp
- ResourceGather
- RestoreV2
- SaveV2
- ShardedFilename
- StatefulPartitionedCall
- StaticRegexFullMatch
- VarHandleOp
- TFText>WhitespaceTokenizeWithOffsetsV2
- VarIsInitializedOp
Supported via ai.onnx.contrib:
- StaticRegexReplace
- StringJoin
- StringSplitV2
- StringToHashBucketFast

Not sure if this is relevant, I previously found that the conversion doesn't proceed without having to explicitly import tensorflow_text. In order to do this, I have a custom script (shared below) which invokes tf2onnx.convert.main().

import tensorflow as tf
import tensorflow_text as tf_text 
import tensorflow_transform as tft 

import tf2onnx.convert

print("Done importing custom tf modules...")
print("Invoking tf2onnx.convert.main()...")
tf2onnx.convert.main()

I've tried switching my numpy version to 1.20 as mentioned in one of the github issues, but this doesn't seem to work either. Would appreciate your help with this!

Oct 25 '23 21:10 mon95

What's your version of tensorflow-text package? Is it a 2.9.x version? If not, could you please try to install it to a 2.9.x version? And also upgrade python to python 3.8 at least?

Thanks.

Oct 27 '23 10:10 fatcat-z

Thank you for looking into this!

Tensorflow_text is version 2.9.0 Tensorflow_transform is version 1.10.0 And, I tried upgrading the python version (now 3.9.18) and I still get the same error.

Any chance that the tensorflow_transform usage is causing issues? The model contains transform layers which depend on the tensorflow_transform package.

Oct 30 '23 20:10 mon95

tensorflow-onnx tensorflow-onnx copied to clipboard

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128)

tensorflow-onnx
tensorflow-onnx copied to clipboard