requests requests 2.32.3 & urllib3 1.26.18 issue with unicode put

trafficstars

I'm building requests 2.32.3 in Fedora Linux and I have a problem with test_unicode_header_name - the test hangs.

It's reproducible - when I use urllib3 at least 2.0.2, the code works fine, with urllib3 1.26.18, it hangs waiting for a response.

Expected Result

Older urllib3 is still allowed (urllib3>=1.21.1,<3) so it should work.

Actual Result

The call to requests.put hangs and if killed, the stacktrace is:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/lbalhar/Software/requests/src/requests/api.py", line 130, in put
    return request("put", url, data=data, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lbalhar/Software/requests/src/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lbalhar/Software/requests/src/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lbalhar/Software/requests/src/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lbalhar/Software/requests/src/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/home/lbalhar/.virtualenvs/requests/lib/python3.12/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/usr/lib64/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Reproduction Steps

Start httpbin instance, install urllib3<2 and then:

import requests
requests.put("http://127.0.0.1:8080/put", headers={"Content-Type": "application/octet-stream"}, data="\xff")

System Information

$ python -m requests.help

{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.3.2"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.12.3"
  },
  "platform": {
    "release": "6.8.10-300.fc40.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.32.3"
  },
  "system_ssl": {
    "version": "30200010"
  },
  "urllib3": {
    "version": "1.26.18"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

Jun 04 '24 10:06 frenzymadness

Thanks for reporting this, @frenzymadness! I'd thought we had a standalone GHA to still test on "urllib3<2" but that's for a separate project. I'll work on getting that added to ensure we don't have regressions.

We'll need to take a closer look at what's happening but I have a feeling this may be a byproduct of #6589. I'm wondering if we're sending a Content-Length 1 byte longer than what we're actually emitting. I was surprised when that issue was opened we hadn't had this problem before but there may be some subtle variance between the two major versions that was overlooked.

Jun 04 '24 20:06 nateprewitt

I took the code from the test_content_length_for_string_data_counts_bytes and it seems to work fine:

>>> import requests
>>> data = "This is a string containing multi-byte UTF-8 ☃"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'51'
>>> length
'51'

And for the data from test_unicode_header_name

>>> data = "\xff"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'2'
>>> length
'2'

Jun 05 '24 07:06 frenzymadness

Hi all,

I was also facing a similar issue like @frenzymadness, and I can confirm that it is caused by #6589. I'm not sure whether I should continue the conversation here or at #6589, but I'll start off here

Intro

First off I want to mention that when you send the request

requests.put('https://httpbin.org/put', headers={'Content-Type': 'application/octet-stream'}, data='\xff')

it doesn't actually hang, but is actually waiting for a response from the server, and after a while the code fails with

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

A similar thing occurred with the server I was communicating with but it it actually sent a response, something like

400: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

I was sending a str with non-ascii characters (same as @frenzymadness) and it turned out that the request had an incorrect Content-Length header, something like 545 instead of the correct 543.

Issue

The issue is that if you pass a str as the data of a request, when calculating the Content-Length in super_len you encode the string with utf-8, and later on when you get to the real encoding of the body in python3.9/http/client.py on line 1330 it actually uses latin-1 (same as iso-8859-1).

So in the case of our simple example where we send '\xff' we have the following

>>> a = '\xff'
>>> len(a)
1
>>> len(a.encode('utf-8'))
2
>>> len(a.encode('latin-1'))
1

So we would be setting the Content-Length to 2 when we would actually be sending 1 byte of data.

What I find interesting is that I don't think the tests created in #6589 serve a real purpose since if you sent a request like the following, your code would fail, and it wouldn't matter that our Content-Length is 'correct'

>>> requests.put('https://httpbin.org/put', data='👍👎')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('👍👎') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

So the 'workaround' mentioned in #6586 is actually the way a request like this should be sent.

So with all that being said, I'd think the fix would be to just revert the commit that introduced this change.

What do you think?

Jun 05 '24 14:06 danilom-git

Just hit this myself when trying to release urllib3 1.26.19. urllib3 2.x made a change where string bodies are encoded as UTF-8 instead of Latin-1. It was an accidental change, and I've started working on fixing/documenting it in https://github.com/urllib3/urllib3/issues/3053 and https://github.com/urllib3/urllib3/pull/3063 but ultimately dropped the ball, sorry.

Then, https://github.com/psf/requests/pull/6589 adapted requests to work with urllib3 2.x by encoding to UTF-8 to compute the Content-Length. Which means that with \xff, requests now sets Content-Length to 2, but urllib3 1.26.x only sends one byte, which is why the test hangs. Since we're not planning to revert to Latin-1 in urllib3, the fix would be for requests to explicitly encode string bodies to UTF-8 (or not try to guess the Content-Lenght, I suppose). If it does, it would be nice to avoid encoding twice which is what happens today.

Jun 17 '24 11:06 pquentin

I'm kind of tempted to just add something in compat to flag the major version of urllib3 like PY2/PY3 in six. I don't think we can remove content-length computing since that would be a fairly breaking change for the library. Requests is still straddling the line of old internet where everything was latin-1 (and only latin-1), vs today where most things behave with utf-8.

I think we'll likely want to keep the change in some form. I don't know if any of the other maintainers have other thoughts on solutions.

Jul 02 '24 01:07 nateprewitt

requests requests copied to clipboard

requests 2.32.3 & urllib3 1.26.18 issue with unicode put

Expected Result

Actual Result

Reproduction Steps

System Information

Intro

Issue

requests
requests copied to clipboard