systems icon indicating copy to clipboard operation
systems copied to clipboard

Add support for unicode string inputs to Workflow Transform in Triton

Open oliverholworthy opened this issue 1 year ago • 1 comments

Add support for unicode string inputs to Workflow Transform in Triton.

  • Adds a test for running a Workflow with non-ascii charaters in string inputs.

We currently get the following error from the .astype("str") call if we pass string inputs with non-ascii characters.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

This is because when we pass a string like "椅子" to a triton model, that tensor is received as np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object). If you try to do .astype(str) on this, it raises this UnicodeDecodeError.

We can coerce array of byte strings to unicode strings with np.char.decode(out.astype(bytes)) on the array, where out = np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object). ~However, it appears we can safely remove the line that is performing the coersion. (It doesn't appear to break any existing tests at least.)~

oliverholworthy avatar May 10 '23 14:05 oliverholworthy