httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions

Open ZM25XC opened this issue 7 months ago • 5 comments

Description

I have encountered decoding errors with some requests that use ASCII encoding. Changing the default encoding to UTF-8 resolves these errors. I propose updating the normalize_header_key and normalize_header_value functions in _utils.py to use UTF-8 as the default encoding.

Steps to Reproduce

  1. Call normalize_header_key or normalize_header_value with a non-ASCII string and no encoding specified.
  2. Observe the decoding failure with ASCII encoding.
  3. Change the encoding to UTF-8 and observe that the error is resolved.

Example Code

header_key_unicode = "内容类型"
normalized_key_unicode = normalize_header_key(header_key_unicode, lower=True)
# This raises a UnicodeEncodeError with ASCII encoding.

normalized_key_unicode_utf8 = normalize_header_key(header_key_unicode, lower=True, encoding="utf-8")
print(normalized_key_unicode_utf8)  # Works correctly with UTF-8 encoding.

Proposed Solution

Modify the _utils.py file to use UTF-8 as the default encoding:

def normalize_header_key(
    value: str | bytes,
    lower: bool,
    encoding: str | None = None,
) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header key.
    """
    if isinstance(value, bytes):
        bytes_value = value
    else:
        bytes_value = value.encode(encoding or "utf-8")

    return bytes_value.lower() if lower else bytes_value

def normalize_header_value(value: str | bytes, encoding: str | None = None) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header value.
    """
    if isinstance(value, bytes):
        return value
    return value.encode(encoding or "utf-8")

Rationale

Using UTF-8 as the default encoding ensures that the functions can handle a wider range of input values without raising an error. UTF-8 encoding is capable of encoding a larger set of characters compared to ASCII.

ZM25XC avatar Jul 11 '24 14:07 ZM25XC