httpx
httpx copied to clipboard
Mention header name error message for invalid header encodings
Discussed in https://github.com/encode/httpx/discussions/3399
Originally posted by RobertCraigie November 12, 2024 This openai-python user ran into a confusing error when passing a non-ascii header value, would it be possible to mention the header name in the error message?
Minimal repro
import httpx
httpx.Headers({"auth": "здравейздравейздравейздравей"})
Traceback (most recent call last):
File "script.py", line 3, in <module>
httpx.Headers({"auth": "здравейздравейздравейздравей"})
File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 74, in __init__
self._list = [
File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 78, in <listcomp>
normalize_header_value(v, encoding),
File ".venv/lib/python3.9/site-packages/httpx/_utils.py", line 53, in normalize_header_value
return value.encode(encoding or "ascii")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-27: ordinal not in range(128)
```</div>
Would you be able to review what range of characters the h11 package uses for valid HTTP headers?
(Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)
Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)
From https://github.com/python-hyper/h11/blob/master/h11/_headers.py
# Facts
# -----
#
# Headers are:
# keys: case-insensitive ascii
# values: mixture of ascii and raw bytes
#
# "Historically, HTTP has allowed field content with text in the ISO-8859-1
# charset [ISO-8859-1], supporting other charsets only through use of
# [RFC2047] encoding. In practice, most HTTP header field values use only a
# subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD
# limit their field values to US-ASCII octets. A recipient SHOULD treat other
# octets in field content (obs-text) as opaque data."
# And it deprecates all non-ascii values
So it's essentially direct-quoting from HTTP/1.1 spec, and thus the choice of ascii encoding makes sense.
In the main, these sorts of situations are going to happen when using authentication headers, which are often obtained via some sort of "secret management" process that includes encryption/decryption and/or base64 encoding/decoding along the way before such values get injected into actual code. This leaves the door open for upstream human errors to propagate down into this level while not being "obvious" due to the opaque nature of it all.
While the example above is very contrived using Cyrillic alphabet, the real error source was more like some bad copy/paste of the correct value.