Tiny maybe-bug in obsolete line folding handling
I was looking at https://github.com/shazow/urllib3/pull/1318, and realized I wasn't quite sure what the right rules were for handling whitespace in headers using the "obsolete line-folding rule". Specifically, if we have a header like Name: value1${WHITESPACE1}\r\n${WHITESPACE2}value2 then what is the header's logical value?
Currently h11 treats it as: value1${WHITESPACE1} value2
That PR makes urllib3 treat it as: value1 value2
RFC 7230 says:
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
(And then in the text it specifies that obs-fold tokens should be replaced by one or more SP.)
As written, this actually says that ${WHITESPACE1} being non-empty is an error, and only Name: value1\r\n${WHITESPACE2}value2 is legal (and it becomes Name: value1 value2). So h11 doesn't quite follow that.
The reason h11 doesn't quite follow it though is that the spec's ABNF is buggy – its field-content rule is just wrong: https://www.rfc-editor.org/errata/eid4189
The suggestion in that errata is:
field-content = field-vchar [ 1*( SP / HTAB / field-vchar ) field-vchar ]
obs-fold = OWS CRLF 1*( SP / HTAB )
This matches the urllib3 interpretation, but not the current h11 interpretation.
Of course, the comment on that errata is "this is not right", but OTOH "The [fix] is the best approach for now, and a better fix will be developed", someday, maybe.
So I guess ideally h11 should switch to strip trailing whitespace from header lines that are followed by obs-fold lines.
This is the opposite of a high-priority bug.
A somewhat related wget bug: https://git.savannah.gnu.org/cgit/wget.git/commit/?id=1fc9c95ec144499e69dc8ec76dbe07799d7d82cd
Fortunately, that wget bug is very different in practice: if I'm reading this right they were interpreting the header as value1${WHITESPACE1}\r\n${WHITESPACE2}value2 which is, uh.... very very definitely not right, given that it's supposed to be impossible for \r and \n to appear in a header value. h11 doesn't do that.
Right! I just thought it was an interesting client-side header-parsing security bug, but maybe this issue was not the right place to say that.
Oh, sure, it's a good reminder of how big problems can arise from small parsing in obscure things like the handling of whitespace in obsolete line folding :-). I think h11 is pretty careful about these and that the specific issue discussed in this bug is not security relevant, but some day I'm sure I'll be wrong so it's good to check :-)
The obsolete RFC 2616 says:
All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.
Which seems consistent with urllib3's interpretation. Though it also seems to bless h11's interpretation.