systems
systems copied to clipboard
Add support for unicode string inputs to Workflow Transform in Triton
Add support for unicode string inputs to Workflow Transform in Triton.
- Adds a test for running a Workflow with non-ascii charaters in string inputs.
We currently get the following error from the .astype("str")
call if we pass string inputs with non-ascii characters.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)
This is because when we pass a string like "椅子"
to a triton model, that tensor is received as np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object)
. If you try to do .astype(str)
on this, it raises this UnicodeDecodeError.
We can coerce array of byte strings to unicode strings with np.char.decode(out.astype(bytes))
on the array, where out = np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object)
. ~However, it appears we can safely remove the line that is performing the coersion. (It doesn't appear to break any existing tests at least.)~