tornado
tornado copied to clipboard
UnicodeDecodeError in curl_httpclient's `_curl_debug()`
We ran into an issue in SaltStack with curl_httpclient
when using a proxy and trying to download a binary file:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py", line 497, in _curl_debug
debug_msg = native_str(debug_msg)
File "/usr/lib/python3/dist-packages/salt/ext/tornado/escape.py", line 219, in to_unicode
return value.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 1: invalid continuation byte
It seems, the fix for this issue is quite similar to the one for #1608 in d7d9c467cda38f4c9352172ba7411edc29a85196, but I'm not sure whether this would have also to be extended to the other cases a few lines below as 27a6103148a78100fc479f5dfc65065ef8e6ea72 changed the usage of native_str()
.
This hotfix (applied to the Tornado bundled with Salt) made it at least work for me and it seems .decode('latin1')
isn't used in current master as well.
diff --git a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
index 8652343..6ef3349 100644
--- a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig
+++ b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
@@ -494,7 +494,7 @@ class CurlAsyncHTTPClient(AsyncHTTPClient):
def _curl_debug(self, debug_type, debug_msg):
debug_types = ('I', '<', '>', '<', '>')
- debug_msg = native_str(debug_msg)
+ debug_msg = native_str(debug_msg.decode('latin1'))
if debug_type == 0:
curl_log.debug('%s', debug_msg.strip())
elif debug_type in (1, 2):
Hmm, I can't find any documentation of what character encoding libcurl uses for its debug messages. It looks like it must not be utf-8, though, which is what native_str
hard-codes. So replacing native_str(debug_msg)
with debug_msg.decode('latin1')
is probably the right thing to do (using both native_str
and decode
was for python 2/3 compatibility and now only one is needed at a time).
However, if the message is not in fact latin1 text (say it's a binary blob, or some other encoding entirely), this could just print out garbage. It may be better to catch the UnicodeDecodeError and if we see one, to log it with %r
instead of %s
as we do a few lines below.
When you added the decode('latin1')
did the message print legibly or did you get garbage? (byte 0xf4
is ô
in latin1, is that what you expected in the message?)