multi-model-server icon indicating copy to clipboard operation
multi-model-server copied to clipboard

Non Ascii character in post request

Open uditpython opened this issue 5 years ago • 5 comments

Hi,

Can you help me in mms if i need to send non ascii characters in the request post? How can I do that because it gives ? for non ascii characters.

uditpython avatar Oct 15 '19 07:10 uditpython

Hi @uditpython : Could you explain your question further? What are you trying to send back in the response and what is the error that you are hitting?

vdantu avatar Oct 15 '19 15:10 vdantu

Adding @alexwong

vdantu avatar Oct 15 '19 15:10 vdantu

@vdantu not sure if this falls under this issue, but I found my MMS custom service having issues when I was posting a url with Unicode characters, for example: http://example.test/你好

I investigated a bit, and found this was due to the print statements causing an error. When checking sys.stdout.encoding I see that it is "cp1252" rather than the expected "utf-8". If I run my python executable directly, I see the stdout encoding as "utf-8". Note that we see this unicode issue on both Linux and windows.

Any idea why the stdout encoding is getting changed? A workaround is to reconfigure encoding in the handler.

import sys
sys.stdout.reconfigure(encoding='utf-8')

mikeobr avatar Dec 17 '19 15:12 mikeobr

@mikeobr
Thanks for investigation.

You proposal seems a good improvement. And mms should force use utf-8 for sys.stdout since java end expect it's utt-8. cp1252 is windows default encoding, when windows start java process it by default pass -Dfile.encoding=cp1252, you can verify this by print out the java system property. The python worker is started by java and it inherited it's io, that why you see this issue.

By it doesn't seems related to original issue. The post data received in worker are binary form, the data type is bytes, it should be treated as str. it's ModelService's responsibility to decode them properly. Only ModelService know which encoding should be used to decode the post data (although HTTP headers might also provide that information, but still service code should handle the decode based on header).

frankfliu avatar Dec 17 '19 18:12 frankfliu

@frankfliu, thanks, that explanation makes sense why the default encoding is different. I have logged this as a different issue: https://github.com/awslabs/multi-model-server/issues/883

mikeobr avatar Dec 17 '19 20:12 mikeobr