Cookie::normalize[_attribute](): add tests and 2 bugfixes
Tests: new Cookie\NormalizeTest class
... with dedicated tests for the Cookie::normalize() and Cookie::normalize_attribute() methods.
Cookie::normalize_attribute(): bug fix - harden code against incorrect input types
As attributes can be manually set, they can be set to invalid/unsupported input types.
This commit adds type hardening to the code in the Cookie::normalize_attribute() method and adds a set of tests to safeguards this.
Nearly all of those tests would error out without the added type hardening.
While this could be considered a breaking change, IMO it is a bug fix as the method did not comply with the RFC specifications.
Cookie::normalize_attribute(): bug fix - domain should always be lowercase
RFC 6265, section 5.2.3 reads (emphasis is mine):
If the attribute-name case-insensitively matches the string "Domain", the user agent MUST process the cookie-av as follows.
If the attribute-value is empty, the behavior is undefined. However, the user agent SHOULD ignore the cookie-av entirely.
If the first character of the attribute-value string is %x2E ("."): Let cookie-domain be the attribute-value without the leading %x2E (".") character.
Otherwise: Let cookie-domain be the entire attribute-value.
Convert the cookie-domain to lower case.
Append an attribute to the cookie-attribute-list with an attribute- name of Domain and an attribute-value of cookie-domain.
Ref: https://datatracker.ietf.org/doc/html/rfc6265#section-5.2.3
Based on this, domains should be lowercased. This was, so far, not handled in the Cookie::normalize_attribute() method.
I've implemented this now using strtolower(), though this will break on unicode domain names.
👉 Open question: Should the IdnaEncoder::to_ascii() method be applied to the domain prior to lowercasing ?
Open question: Should the IdnaEncoder::to_ascii() method be applied to the domain prior to lowercasing ?
Yes, I think this needs to be done to have the right casing approach.
While looking closer at the IdnaEncoder, I noticed that the individual character mappings are missing, so it might not always produce the correct result. We'll need to investigate this a bit closer as it compares to strtolower. I'd be surprised if that was the case, but it could actually be that the conceptually wrong strtolower produces better results in the more common cases - something to check first.
I've updated the PR to include normalizing Unicode domains prior to running strtolower() and added some extra tests to confirm.
Addressing the missing part of the IDNAEncoder is outside the scope of this PR.