weblink BREAKING: Overhaul HTTP headers (make case-insensitive, allow for multiples)

According to spec, HTTP header field names should be treated in a case-insensitive way.

However, so far this was not the case with Weblink. This meant a bunch of things were silently very broken. For example, a Node.js fetch() client could not send POST request bodies: the server was expecting a Content-Length header, but got content-length instead.

This implementation introduces a new abstract HeaderName (and a companion HeaderValue). When implicitly assigning a String to a HeaderName, it simply gets converted to lowercase (plus some ASCII validation). This way it can be used as a key even in default standard library collections and the comparisons would be case-insensitive.

Of course, this means the original string casing is lost – but given that HTTP/2 (not supported currently) forces sent header names to be lowercase, I deemed it an acceptable loss.

Also according to spec, multiple header values with the same name can coexist. They can be either set as separate key-value pairs (i.e. Set-Cookie) or as a comma-separated list (e.g. Accept-Language, If-Match).

Request did not support that usecase. Response did, but made it difficult to query. Both classes now use a specialized collection, a HeaderMap, that allows for multiple values to be associated with one key.

Sep 26 '24 01:09 Frixuu

This PR has ballooned more than I expected, so it's time for me to give you a quick status update.

What should work (as I understand it):

[x] Incoming and outgoing headers' names are compared in a case-insensitive way.
[x] Incoming and outgoing headers' names are checked for non-ASCII bytes.
[x] Incoming and outgoing headers' values are checked for control characters.
[x] Incoming headers which are known to have list-type values separate them.
[x] Outgoing headers which are known to have list-type values merge them together.
[x] Outgoing headers which are known to be unique are checked for the existence of multiple values.

What's not done yet, but I'm not sure if I want to cram it into the same patch:

[ ] Incoming header values with "quoted text" do not get recognized and unquoted
[ ] Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.
[ ] Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)

Sep 27 '24 17:09 Frixuu

This PR has ballooned more than I expected, so it's time for me to give you a quick status update.

What should work (as I understand it):

[x] Incoming and outgoing headers' names are compared in a case-insensitive way.

[x] Incoming and outgoing headers' names are checked for non-ASCII bytes.

[x] Incoming and outgoing headers' values are checked for control characters.

[x] Incoming headers which are known to have list-type values separate them.

[x] Outgoing headers which are known to have list-type values merge them together.

[x] Outgoing headers which are known to be unique are checked for the existence of multiple values.

What's not done yet, but I'm not sure if I want to cram it into the same patch:

[ ] Incoming header values with "quoted text" do not get recognized and unquoted

[ ] Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.

[ ] Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)

I'd say split it up, you can group it, back I think we can merge this in, is there any other things you'd like to include in it?

Sep 29 '24 16:09 PXshadow

Nah; if there was something I desperately needed to include, I'd set the PR as draft :sweat_smile: How would you like to proceed?

Sep 29 '24 17:09 Frixuu

I'll take a look today and most likely merge it.

Oct 02 '24 16:10 PXshadow