h11
h11 copied to clipboard
Update header value validation to match WHAT-WG fetch spec
Header values are a mess. Supposedly they're defined by RFC 7230, but in fact it has a bug and its definition is obviously wrong. And, in practice, implementations are substantially more lax than RFC 7230, even after you fix the obvious bug.
In #57/#68, we adjusted our validation rule to allow more characters, based on some intuition and a small amount of new data (e.g. we allow \x01, which is used by google analytics cookies, but still disallow \x00).
But, it turns out that the WHAT-WG fetch spec has an actual precise definition for header values: https://fetch.spec.whatwg.org/#concept-header-value
Weird that it's here instead of in some HTTP spec, but I'll take it.
I think there are two differences between what h11 does currently and the WHAT-WG spec:
- We disallow vertical tab (
\v) and form-feed (\f), which are obscure line-breaking whitespace characters. They only disallow\rand\n. - They allow empty header values; we don't. (Mentioned in #96.)
We should probably switch to matching the WHAT-WG behavior exactly.
@njsmith Out of curiosity, what exactly is the bug in the RFC 7230 definition, and why is the definition obviously wrong?
The spec accidentally disallows any header value that contains a single character word inside it. For example, this is not a valid header would be an illegal header value, because the word a is only one character long.
RFC7230 is obsolete; the specification you want is here.
Regarding single word field values -- how do you come to that conclusion?