tensorflow-onnx icon indicating copy to clipboard operation
tensorflow-onnx copied to clipboard

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128)

Open mon95 opened this issue 2 years ago • 2 comments

Describe the bug I'm seeing a UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1024: ordinal not in range(128). The code in tf_utils.py (https://github.com/onnx/tensorflow-onnx/blob/main/tf2onnx/tf_utils.py#L57) seems to mark this as expected, but the fallback to np.vectorize(lambda x: x.decode('UTF-8')) also seems to fail with a similar error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1024: invalid start byte

Urgency N/A

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): GCP VM
  • TensorFlow Version: 2.9.00
  • Python version: 3.7.12
  • ONNX version (if applicable, e.g. 1.11*): onnx-1.14.1 (installed via pip install git+https://github.com/onnx/tensorflow-onnx)
  • ONNXRuntime version (if applicable, e.g. 1.11*): onnxruntime-1.14.1

To Reproduce The model is a custom DCN v2 model built using libraries from the tensorflow ecosystem. This includes TFRS (recommender systems), TFR (ranking), TF Text, TF IO, and TF Transform. The model is saved using tf.saved_model.save(..).

Screenshots Screen Shot 2023-10-25 at 2 00 33 PM

Additional context

  1. I found that a whole set of ops in the model don't seem to be present in the supported list of ops. But based on the troubleshooting guide, the error I'm seeing here looks different from the one mentioned in the guide. Is it possible that the decode errors are due to the unsupported ops?
  • Missing ops:
    • Bucketize
    • AssignVariableOp
    • InitializeTableFromTextFileV2
    • LookupTableImportV2
    • MergeV2Checkpoints
    • ReadVariableOp
    • ResourceGather
    • RestoreV2
    • SaveV2
    • ShardedFilename
    • StatefulPartitionedCall
    • StaticRegexFullMatch
    • VarHandleOp
    • TFText>WhitespaceTokenizeWithOffsetsV2
    • VarIsInitializedOp  
  • Supported via ai.onnx.contrib:
    • StaticRegexReplace
    • StringJoin
    • StringSplitV2
    • StringToHashBucketFast
  1. Not sure if this is relevant, I previously found that the conversion doesn't proceed without having to explicitly import tensorflow_text. In order to do this, I have a custom script (shared below) which invokes tf2onnx.convert.main().
import tensorflow as tf
import tensorflow_text as tf_text 
import tensorflow_transform as tft 

import tf2onnx.convert

print("Done importing custom tf modules...")
print("Invoking tf2onnx.convert.main()...")
tf2onnx.convert.main()

I've tried switching my numpy version to 1.20 as mentioned in one of the github issues, but this doesn't seem to work either. Would appreciate your help with this!

mon95 avatar Oct 25 '23 21:10 mon95

What's your version of tensorflow-text package? Is it a 2.9.x version? If not, could you please try to install it to a 2.9.x version? And also upgrade python to python 3.8 at least?

Thanks.

fatcat-z avatar Oct 27 '23 10:10 fatcat-z

Thank you for looking into this!

Tensorflow_text is version 2.9.0 Tensorflow_transform is version 1.10.0 And, I tried upgrading the python version (now 3.9.18) and I still get the same error.

Any chance that the tensorflow_transform usage is causing issues? The model contains transform layers which depend on the tensorflow_transform package.

mon95 avatar Oct 30 '23 20:10 mon95