tornado icon indicating copy to clipboard operation
tornado copied to clipboard

UnicodeDecodeError in curl_httpclient's `_curl_debug()`

Open eliasp opened this issue 2 years ago • 1 comments

We ran into an issue in SaltStack with curl_httpclient when using a proxy and trying to download a binary file:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py", line 497, in _curl_debug
    debug_msg = native_str(debug_msg)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/escape.py", line 219, in to_unicode
    return value.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 1: invalid continuation byte

It seems, the fix for this issue is quite similar to the one for #1608 in d7d9c467cda38f4c9352172ba7411edc29a85196, but I'm not sure whether this would have also to be extended to the other cases a few lines below as 27a6103148a78100fc479f5dfc65065ef8e6ea72 changed the usage of native_str().

This hotfix (applied to the Tornado bundled with Salt) made it at least work for me and it seems .decode('latin1') isn't used in current master as well.

diff --git a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
index 8652343..6ef3349 100644
--- a/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py.orig
+++ b/usr/lib/python3/dist-packages/salt/ext/tornado/curl_httpclient.py
@@ -494,7 +494,7 @@ class CurlAsyncHTTPClient(AsyncHTTPClient):

     def _curl_debug(self, debug_type, debug_msg):
         debug_types = ('I', '<', '>', '<', '>')
-        debug_msg = native_str(debug_msg)
+        debug_msg = native_str(debug_msg.decode('latin1'))
         if debug_type == 0:
             curl_log.debug('%s', debug_msg.strip())
         elif debug_type in (1, 2):

eliasp avatar Sep 04 '22 14:09 eliasp

Hmm, I can't find any documentation of what character encoding libcurl uses for its debug messages. It looks like it must not be utf-8, though, which is what native_str hard-codes. So replacing native_str(debug_msg) with debug_msg.decode('latin1') is probably the right thing to do (using both native_str and decode was for python 2/3 compatibility and now only one is needed at a time).

However, if the message is not in fact latin1 text (say it's a binary blob, or some other encoding entirely), this could just print out garbage. It may be better to catch the UnicodeDecodeError and if we see one, to log it with %r instead of %s as we do a few lines below.

When you added the decode('latin1') did the message print legibly or did you get garbage? (byte 0xf4 is ô in latin1, is that what you expected in the message?)

bdarnell avatar Sep 27 '22 01:09 bdarnell