systems Add support for unicode string inputs to Workflow Transform in Triton

Add support for unicode string inputs to Workflow Transform in Triton

Open oliverholworthy opened this issue 2 years ago • 1 comments

Add support for unicode string inputs to Workflow Transform in Triton.

Adds a test for running a Workflow with non-ascii charaters in string inputs.

We currently get the following error from the .astype("str") call if we pass string inputs with non-ascii characters.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

This is because when we pass a string like "椅子" to a triton model, that tensor is received as np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object). If you try to do .astype(str) on this, it raises this UnicodeDecodeError.

We can coerce array of byte strings to unicode strings with np.char.decode(out.astype(bytes)) on the array, where out = np.array([b'\xe6\xa4\x85\xe5\xad\x90'], dtype=object). ~However, it appears we can safely remove the line that is performing the coersion. (It doesn't appear to break any existing tests at least.)~

May 10 '23 14:05 oliverholworthy

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-345

May 10 '23 15:05 github-actions[bot]

systems systems copied to clipboard

Add support for unicode string inputs to Workflow Transform in Triton

Documentation preview

systems
systems copied to clipboard