llhttp icon indicating copy to clipboard operation
llhttp copied to clipboard

Incorrect handling of folded header values?

Open mscdex opened this issue 6 years ago • 3 comments

In one of the existing llhttp tests (shown at the bottom), it shows llhttp reporting some of the folded whitespace characters as being part of the header value. According to RFC 7230 that shouldn't be the case as we have this ABNF:

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

 obs-fold       = CRLF 1*( SP / HTAB )
                ; obsolete line folding
                ; see Section 3.2.4

So in this particular test, the header value for the Line1 header should be more like

abcdefghijklmno qrs

instead of

abc\tdef ghi\t\tjkl  mno \t \tqrs

(although the space between o and q shouldn't technically be permitted according to the ABNF, but that's a separate matter).

Do we know if anyone is relying on the current behavior?


Line folding in header value with LF

GET / HTTP/1.1\n\
Line1:   abc\n\
\tdef\n\
 ghi\n\
\t\tjkl\n\
  mno \n\
\t \tqrs\n\
Line2: \t line2\t\n\
Line3:\n\
 line3\n\
Line4: \n\
 \n\
Connection:\n\
 close\n\
\n
off=0 message begin
off=4 len=1 span[url]="/"
off=15 len=5 span[header_field]="Line1"
off=24 len=3 span[header_value]="abc"
off=28 len=4 span[header_value]="\tdef"
off=33 len=4 span[header_value]=" ghi"
off=38 len=5 span[header_value]="\t\tjkl"
off=44 len=6 span[header_value]="  mno "
off=51 len=6 span[header_value]="\t \tqrs"
off=58 len=5 span[header_field]="Line2"
off=67 len=6 span[header_value]="line2\t"
off=74 len=5 span[header_field]="Line3"
off=82 len=5 span[header_value]="line3"
off=88 len=5 span[header_field]="Line4"
off=98 len=0 span[header_value]=""
off=98 len=10 span[header_field]="Connection"
off=111 len=5 span[header_value]="close"
off=118 headers complete method=1 v=1/1 flags=2 content_length=0
off=118 message complete

mscdex avatar Nov 16 '19 21:11 mscdex

Good point. This behavior was ported from http-parser, and I had no idea whether it was still required or not. cc @nodejs/http

indutny avatar Nov 18 '19 18:11 indutny

I'm considering fixing this. In the case of \t \tqrs\n\r, shouldn't OBS eat tab-space-tab, and not leave the space?

pallas avatar Jan 13 '21 19:01 pallas

   A field value might be preceded and/or followed by optional
   whitespace (OWS); a single SP preceding the field-value is preferred
   for consistent readability by humans.  The field value does not
   include any leading or trailing whitespace: OWS occurring before the
   first non-whitespace octet of the field value or after the last
   non-whitespace octet of the field value ought to be excluded by
   parsers when extracting the field value from a header field.

pallas avatar Jan 13 '21 19:01 pallas

@pallas I think this has been addressed, am I right?

ShogunPanda avatar Aug 22 '22 13:08 ShogunPanda

This has been solved in 4b9b57d9a62ae6bc6f31a8a485ca58a9f090493f

ShogunPanda avatar Aug 23 '22 10:08 ShogunPanda