h11 icon indicating copy to clipboard operation
h11 copied to clipboard

Update header value validation to match WHAT-WG fetch spec

Open njsmith opened this issue 5 years ago • 3 comments

Header values are a mess. Supposedly they're defined by RFC 7230, but in fact it has a bug and its definition is obviously wrong. And, in practice, implementations are substantially more lax than RFC 7230, even after you fix the obvious bug.

In #57/#68, we adjusted our validation rule to allow more characters, based on some intuition and a small amount of new data (e.g. we allow \x01, which is used by google analytics cookies, but still disallow \x00).

But, it turns out that the WHAT-WG fetch spec has an actual precise definition for header values: https://fetch.spec.whatwg.org/#concept-header-value

Weird that it's here instead of in some HTTP spec, but I'll take it.

I think there are two differences between what h11 does currently and the WHAT-WG spec:

  • We disallow vertical tab (\v) and form-feed (\f), which are obscure line-breaking whitespace characters. They only disallow \r and \n.
  • They allow empty header values; we don't. (Mentioned in #96.)

We should probably switch to matching the WHAT-WG behavior exactly.

njsmith avatar Jan 16 '20 21:01 njsmith

@njsmith Out of curiosity, what exactly is the bug in the RFC 7230 definition, and why is the definition obviously wrong?

SyntaxColoring avatar Feb 17 '20 23:02 SyntaxColoring

The spec accidentally disallows any header value that contains a single character word inside it. For example, this is not a valid header would be an illegal header value, because the word a is only one character long.

njsmith avatar Feb 18 '20 00:02 njsmith

RFC7230 is obsolete; the specification you want is here.

Regarding single word field values -- how do you come to that conclusion?

mnot avatar Jul 31 '23 22:07 mnot