Incorrect handling of folded header values?
In one of the existing llhttp tests (shown at the bottom), it shows llhttp reporting some of the folded whitespace characters as being part of the header value. According to RFC 7230 that shouldn't be the case as we have this ABNF:
header-field = field-name ":" OWS field-value OWS field-name = token field-value = *( field-content / obs-fold ) field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ] field-vchar = VCHAR / obs-text obs-fold = CRLF 1*( SP / HTAB ) ; obsolete line folding ; see Section 3.2.4
So in this particular test, the header value for the Line1 header should be more like
abcdefghijklmno qrs
instead of
abc\tdef ghi\t\tjkl mno \t \tqrs
(although the space between o and q shouldn't technically be permitted according to the ABNF, but that's a separate matter).
Do we know if anyone is relying on the current behavior?
Line folding in header value with LF
GET / HTTP/1.1\n\
Line1: abc\n\
\tdef\n\
ghi\n\
\t\tjkl\n\
mno \n\
\t \tqrs\n\
Line2: \t line2\t\n\
Line3:\n\
line3\n\
Line4: \n\
\n\
Connection:\n\
close\n\
\n
off=0 message begin
off=4 len=1 span[url]="/"
off=15 len=5 span[header_field]="Line1"
off=24 len=3 span[header_value]="abc"
off=28 len=4 span[header_value]="\tdef"
off=33 len=4 span[header_value]=" ghi"
off=38 len=5 span[header_value]="\t\tjkl"
off=44 len=6 span[header_value]=" mno "
off=51 len=6 span[header_value]="\t \tqrs"
off=58 len=5 span[header_field]="Line2"
off=67 len=6 span[header_value]="line2\t"
off=74 len=5 span[header_field]="Line3"
off=82 len=5 span[header_value]="line3"
off=88 len=5 span[header_field]="Line4"
off=98 len=0 span[header_value]=""
off=98 len=10 span[header_field]="Connection"
off=111 len=5 span[header_value]="close"
off=118 headers complete method=1 v=1/1 flags=2 content_length=0
off=118 message complete
Good point. This behavior was ported from http-parser, and I had no idea whether it was still required or not. cc @nodejs/http
I'm considering fixing this. In the case of \t \tqrs\n\r, shouldn't OBS eat tab-space-tab, and not leave the space?
A field value might be preceded and/or followed by optional
whitespace (OWS); a single SP preceding the field-value is preferred
for consistent readability by humans. The field value does not
include any leading or trailing whitespace: OWS occurring before the
first non-whitespace octet of the field value or after the last
non-whitespace octet of the field value ought to be excluded by
parsers when extracting the field value from a header field.
@pallas I think this has been addressed, am I right?
This has been solved in 4b9b57d9a62ae6bc6f31a8a485ca58a9f090493f