requests
requests copied to clipboard
requests 2.32.3 & urllib3 1.26.18 issue with unicode put
I'm building requests 2.32.3 in Fedora Linux and I have a problem with test_unicode_header_name - the test hangs.
It's reproducible - when I use urllib3 at least 2.0.2, the code works fine, with urllib3 1.26.18, it hangs waiting for a response.
Expected Result
Older urllib3 is still allowed (urllib3>=1.21.1,<3) so it should work.
Actual Result
The call to requests.put hangs and if killed, the stacktrace is:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/lbalhar/Software/requests/src/requests/api.py", line 130, in put
return request("put", url, data=data, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lbalhar/Software/requests/src/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lbalhar/Software/requests/src/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lbalhar/Software/requests/src/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lbalhar/Software/requests/src/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 467, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 462, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/http/client.py", line 1428, in getresponse
response.begin()
File "/usr/lib64/python3.12/http/client.py", line 331, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/http/client.py", line 292, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/socket.py", line 707, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Reproduction Steps
Start httpbin instance, install urllib3<2 and then:
import requests
requests.put("http://127.0.0.1:8080/put", headers={"Content-Type": "application/octet-stream"}, data="\xff")
System Information
$ python -m requests.help
{
"chardet": {
"version": null
},
"charset_normalizer": {
"version": "3.3.2"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "3.7"
},
"implementation": {
"name": "CPython",
"version": "3.12.3"
},
"platform": {
"release": "6.8.10-300.fc40.x86_64",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.32.3"
},
"system_ssl": {
"version": "30200010"
},
"urllib3": {
"version": "1.26.18"
},
"using_charset_normalizer": true,
"using_pyopenssl": false
}
Thanks for reporting this, @frenzymadness! I'd thought we had a standalone GHA to still test on "urllib3<2" but that's for a separate project. I'll work on getting that added to ensure we don't have regressions.
We'll need to take a closer look at what's happening but I have a feeling this may be a byproduct of #6589. I'm wondering if we're sending a Content-Length 1 byte longer than what we're actually emitting. I was surprised when that issue was opened we hadn't had this problem before but there may be some subtle variance between the two major versions that was overlooked.
I took the code from the test_content_length_for_string_data_counts_bytes and it seems to work fine:
>>> import requests
>>> data = "This is a string containing multi-byte UTF-8 ☃"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'51'
>>> length
'51'
And for the data from test_unicode_header_name
>>> data = "\xff"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'2'
>>> length
'2'
Hi all,
I was also facing a similar issue like @frenzymadness, and I can confirm that it is caused by #6589. I'm not sure whether I should continue the conversation here or at #6589, but I'll start off here
Intro
First off I want to mention that when you send the request
requests.put('https://httpbin.org/put', headers={'Content-Type': 'application/octet-stream'}, data='\xff')
it doesn't actually hang, but is actually waiting for a response from the server, and after a while the code fails with
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
A similar thing occurred with the server I was communicating with but it it actually sent a response, something like
400: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
I was sending a str with non-ascii characters (same as @frenzymadness) and it turned out that the request had an incorrect Content-Length header, something like 545 instead of the correct 543.
Issue
The issue is that if you pass a str as the data of a request, when calculating the Content-Length in super_len you encode the string with utf-8, and later on when you get to the real encoding of the body in python3.9/http/client.py on line 1330 it actually uses latin-1 (same as iso-8859-1).
So in the case of our simple example where we send '\xff' we have the following
>>> a = '\xff'
>>> len(a)
1
>>> len(a.encode('utf-8'))
2
>>> len(a.encode('latin-1'))
1
So we would be setting the Content-Length to 2 when we would actually be sending 1 byte of data.
What I find interesting is that I don't think the tests created in #6589 serve a real purpose since if you sent a request like the following, your code would fail, and it wouldn't matter that our Content-Length is 'correct'
>>> requests.put('https://httpbin.org/put', data='👍👎')
Traceback (most recent call last):
...
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('👍👎') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
So the 'workaround' mentioned in #6586 is actually the way a request like this should be sent.
So with all that being said, I'd think the fix would be to just revert the commit that introduced this change.
What do you think?
Just hit this myself when trying to release urllib3 1.26.19. urllib3 2.x made a change where string bodies are encoded as UTF-8 instead of Latin-1. It was an accidental change, and I've started working on fixing/documenting it in https://github.com/urllib3/urllib3/issues/3053 and https://github.com/urllib3/urllib3/pull/3063 but ultimately dropped the ball, sorry.
Then, https://github.com/psf/requests/pull/6589 adapted requests to work with urllib3 2.x by encoding to UTF-8 to compute the Content-Length. Which means that with \xff, requests now sets Content-Length to 2, but urllib3 1.26.x only sends one byte, which is why the test hangs. Since we're not planning to revert to Latin-1 in urllib3, the fix would be for requests to explicitly encode string bodies to UTF-8 (or not try to guess the Content-Lenght, I suppose). If it does, it would be nice to avoid encoding twice which is what happens today.
I'm kind of tempted to just add something in compat to flag the major version of urllib3 like PY2/PY3 in six. I don't think we can remove content-length computing since that would be a fairly breaking change for the library. Requests is still straddling the line of old internet where everything was latin-1 (and only latin-1), vs today where most things behave with utf-8.
I think we'll likely want to keep the change in some form. I don't know if any of the other maintainers have other thoughts on solutions.