BREAKING: Overhaul HTTP headers (make case-insensitive, allow for multiples)
According to spec, HTTP header field names should be treated in a case-insensitive way.
However, so far this was not the case with Weblink. This meant a bunch of things were silently very broken. For example, a Node.js fetch() client could not send POST request bodies: the server was expecting a Content-Length header, but got content-length instead.
This implementation introduces a new abstract HeaderName (and a companion HeaderValue). When implicitly assigning a String to a HeaderName, it simply gets converted to lowercase (plus some ASCII validation). This way it can be used as a key even in default standard library collections and the comparisons would be case-insensitive.
Of course, this means the original string casing is lost – but given that HTTP/2 (not supported currently) forces sent header names to be lowercase, I deemed it an acceptable loss.
Also according to spec, multiple header values with the same name can coexist. They can be either set as separate key-value pairs (i.e. Set-Cookie) or as a comma-separated list (e.g. Accept-Language, If-Match).
Request did not support that usecase. Response did, but made it difficult to query. Both classes now use a specialized collection, a HeaderMap, that allows for multiple values to be associated with one key.
This PR has ballooned more than I expected, so it's time for me to give you a quick status update.
What should work (as I understand it):
- [x] Incoming and outgoing headers' names are compared in a case-insensitive way.
- [x] Incoming and outgoing headers' names are checked for non-ASCII bytes.
- [x] Incoming and outgoing headers' values are checked for control characters.
- [x] Incoming headers which are known to have list-type values separate them.
- [x] Outgoing headers which are known to have list-type values merge them together.
- [x] Outgoing headers which are known to be unique are checked for the existence of multiple values.
What's not done yet, but I'm not sure if I want to cram it into the same patch:
- [ ] Incoming header values with "quoted text" do not get recognized and unquoted
- [ ] Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.
- [ ] Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)
This PR has ballooned more than I expected, so it's time for me to give you a quick status update.
What should work (as I understand it):
- [x] Incoming and outgoing headers' names are compared in a case-insensitive way.
- [x] Incoming and outgoing headers' names are checked for non-ASCII bytes.
- [x] Incoming and outgoing headers' values are checked for control characters.
- [x] Incoming headers which are known to have list-type values separate them.
- [x] Outgoing headers which are known to have list-type values merge them together.
- [x] Outgoing headers which are known to be unique are checked for the existence of multiple values.
What's not done yet, but I'm not sure if I want to cram it into the same patch:
- [ ] Incoming header values with "quoted text" do not get recognized and unquoted
- [ ] Incoming header values with line continuations ("LWS" in RFC 2616) do not get recognized as a single header.
- [ ] Incoming data is not treated as ISO-8859-1, but always as UTF8 (both headers and payload, no matter the MIME type)
I'd say split it up, you can group it, back I think we can merge this in, is there any other things you'd like to include in it?
Nah; if there was something I desperately needed to include, I'd set the PR as draft :sweat_smile: How would you like to proceed?
I'll take a look today and most likely merge it.