httpx Mention header name error message for invalid header encodings

trafficstars

Discussed in https://github.com/encode/httpx/discussions/3399

^{Originally posted by RobertCraigie November 12, 2024} This openai-python user ran into a confusing error when passing a non-ascii header value, would it be possible to mention the header name in the error message?

Minimal repro

import httpx

httpx.Headers({"auth": "здравейздравейздравейздравей"})

Traceback (most recent call last):
  File "script.py", line 3, in <module>
    httpx.Headers({"auth": "здравейздравейздравейздравей"})
  File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 74, in __init__
    self._list = [
  File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 78, in <listcomp>
    normalize_header_value(v, encoding),
  File ".venv/lib/python3.9/site-packages/httpx/_utils.py", line 53, in normalize_header_value
    return value.encode(encoding or "ascii")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-27: ordinal not in range(128)
```</div>

Nov 12 '24 12:11 RobertCraigie

Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

Nov 12 '24 14:11 lovelydinosaur

Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

From https://github.com/python-hyper/h11/blob/master/h11/_headers.py

# Facts
# -----
#
# Headers are:
#   keys: case-insensitive ascii
#   values: mixture of ascii and raw bytes
#
# "Historically, HTTP has allowed field content with text in the ISO-8859-1
# charset [ISO-8859-1], supporting other charsets only through use of
# [RFC2047] encoding.  In practice, most HTTP header field values use only a
# subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD
# limit their field values to US-ASCII octets.  A recipient SHOULD treat other
# octets in field content (obs-text) as opaque data."
# And it deprecates all non-ascii values

So it's essentially direct-quoting from HTTP/1.1 spec, and thus the choice of ascii encoding makes sense.

Nov 25 '24 16:11 jasonkaedingrhino

In the main, these sorts of situations are going to happen when using authentication headers, which are often obtained via some sort of "secret management" process that includes encryption/decryption and/or base64 encoding/decoding along the way before such values get injected into actual code. This leaves the door open for upstream human errors to propagate down into this level while not being "obvious" due to the opaque nature of it all.

While the example above is very contrived using Cyrillic alphabet, the real error source was more like some bad copy/paste of the correct value.

Nov 25 '24 16:11 jasonkaedingrhino

httpx httpx copied to clipboard

Mention header name error message for invalid header encodings

Discussed in https://github.com/encode/httpx/discussions/3399

httpx
httpx copied to clipboard