h11 icon indicating copy to clipboard operation
h11 copied to clipboard

Tiny maybe-bug in obsolete line folding handling

Open njsmith opened this issue 7 years ago • 5 comments

I was looking at https://github.com/shazow/urllib3/pull/1318, and realized I wasn't quite sure what the right rules were for handling whitespace in headers using the "obsolete line-folding rule". Specifically, if we have a header like Name: value1${WHITESPACE1}\r\n${WHITESPACE2}value2 then what is the header's logical value?

Currently h11 treats it as: value1${WHITESPACE1} value2

That PR makes urllib3 treat it as: value1 value2

RFC 7230 says:

     header-field   = field-name ":" OWS field-value OWS

     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
     field-vchar    = VCHAR / obs-text

     obs-fold       = CRLF 1*( SP / HTAB )

(And then in the text it specifies that obs-fold tokens should be replaced by one or more SP.)

As written, this actually says that ${WHITESPACE1} being non-empty is an error, and only Name: value1\r\n${WHITESPACE2}value2 is legal (and it becomes Name: value1 value2). So h11 doesn't quite follow that.

The reason h11 doesn't quite follow it though is that the spec's ABNF is buggy – its field-content rule is just wrong: https://www.rfc-editor.org/errata/eid4189

The suggestion in that errata is:

     field-content  = field-vchar [ 1*( SP / HTAB / field-vchar ) field-vchar ]
     obs-fold       = OWS CRLF 1*( SP / HTAB )

This matches the urllib3 interpretation, but not the current h11 interpretation.

Of course, the comment on that errata is "this is not right", but OTOH "The [fix] is the best approach for now, and a better fix will be developed", someday, maybe.

So I guess ideally h11 should switch to strip trailing whitespace from header lines that are followed by obs-fold lines.

This is the opposite of a high-priority bug.

njsmith avatar Mar 25 '18 02:03 njsmith

A somewhat related wget bug: https://git.savannah.gnu.org/cgit/wget.git/commit/?id=1fc9c95ec144499e69dc8ec76dbe07799d7d82cd

pquentin avatar May 14 '18 05:05 pquentin

Fortunately, that wget bug is very different in practice: if I'm reading this right they were interpreting the header as value1${WHITESPACE1}\r\n${WHITESPACE2}value2 which is, uh.... very very definitely not right, given that it's supposed to be impossible for \r and \n to appear in a header value. h11 doesn't do that.

njsmith avatar May 15 '18 04:05 njsmith

Right! I just thought it was an interesting client-side header-parsing security bug, but maybe this issue was not the right place to say that.

pquentin avatar May 15 '18 10:05 pquentin

Oh, sure, it's a good reminder of how big problems can arise from small parsing in obscure things like the handling of whitespace in obsolete line folding :-). I think h11 is pretty careful about these and that the specific issue discussed in this bug is not security relevant, but some day I'm sure I'll be wrong so it's good to check :-)

njsmith avatar May 16 '18 04:05 njsmith

The obsolete RFC 2616 says:

All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.

Which seems consistent with urllib3's interpretation. Though it also seems to bless h11's interpretation.

njsmith avatar May 28 '19 03:05 njsmith