http-extensions
http-extensions copied to clipboard
Parser for Domain attributes
If a resource on 127.0.0.1 sets a cookie with a Domain attribute set to 127.1, does that work? (Spoiler: it works if it was passed through the host parser, which seems like a prerequisite to determine whether it's an IP address or not as needed by Domain Matching.)
(I raised this in #1707 but it seems that's exclusively used for the non-ASCII bytes discussion.)
Good question, what should happen?
Section 5.5 Step 9, sub-step 1 states
9. If the domain-attribute is non-empty:
1. If the canonicalized request-host does not domain-match the
domain-attribute:
1. Ignore the cookie entirely and abort these steps.
It looks like "domain-match" is doing the heavy lifting in that alogithm.
Section 5.1.3 Domain Matching has the following algorithm
A string domain-matches a given domain string if at least one of the
following conditions hold:
* The domain string and the string are identical. (Note that both
the domain string and the string will have been canonicalized to
lower case at this point.)
* All of the following conditions hold:
- The domain string is a suffix of the string.
- The last character of the string that is not included in the
domain string is a %x2E (".") character.
- The string is a host name (i.e., not an IP address).
Because this is an IP address, only the top condition is applicable meaning that since a cookie with a Domain=127.1 wouldn't be identical to the string (the request-host) the implementor should ignore the cookie entirely.
That's straightforward enough, assuming we agree on my interpretation here.
does that work?
Also a good and (unfortunately) different question.
Well for Chrome 98 it seems we do set the cookie, oops.
This feels like it's an implementation bug, as the spec's instructions seem reasonable, but do you have a different take Anne?
I think using string equality with the Domain attribute value string (post dot removal and lowercasing) is fine. It's a bit stricter, doesn't perpetuate the weird spelling of IP addresses, and is less likely to run into the kind of problems people were concerned with around UTF-8 bytes.
I was concerned with how Domain Matching knows something is an IP address, but I suppose that "The string is a host name (i.e., not an IP address)." is not something it needs to know about the string, but about the domain string it presumably has previously parsed and thus knows about. That might be worth clarifying there to avoid giving the impression that the string needs to be parsed.
Are you suggesting that
- The string is a host name (i.e., not an IP address).
should be modified to say
- The domain string is a host name (i.e., not an IP address).?
Maybe that's good enough. Ideally the spec would just reuse the URL Standard data types, but that's probably not attainable.
Hmm, that change on its own might cause issues
Let's pretend we modified the algorithm to only look at the domain string
A string domain-matches a given domain string if at least one of the
following conditions hold:
* The domain string and the string are identical. (Note that both
the domain string and the string will have been canonicalized to
lower case at this point.)
* All of the following conditions hold:
- The domain string is a suffix of the string.
- The last character of the string that is not included in the
domain string is a %x2E (".") character.
- The domain string is a host name (i.e., not an IP address).
If a response from https://127.0.0.1 sets a cookie foo=bar; Domain=.0.1 then I think we'd check the domain value 0.1, see it doesn't string match the string 127.0.0.1 and then try the next condition. I'm unclear if 0.1 could be considered a valid host name (rfc1123 section 2.1 implies it might be?), but if it is then we could pass this test and allow the cookie to be set.
Maybe we could get away with a warning that no parsing of the input string should occur aside from checking if it's an IP address. But then we've worked our way back to your original point of "I was concerned with how Domain Matching knows something is an IP address"
I was under the mistaken impression that string and not domain string was obtained from the cookie Domain attribute. So the algorithm works as-is for IPv4 (in the strict manner, as discussed), except that you need to know out-of-band whether string is an IP address. (That would be helped by using URL Standard types and making string a proper host, but I digress.)
Now having said that, I'm not sure it works as well for IPv6. There's multiple ways to spell an IPv6 address and if you don't canonicalize (by parsing) you essentially have to spell them the way the URL Standard will canonicalize them.
So you cannot write [0::1] in the Domain attribute and expect it to work. You'd have to write [::1]. That seems broken? (Again, this example will likely work in Chrome per your earlier comments.)
I don't think that the definition permits IPv6 addresses, so that might be moot. Adding brackets to the grammar might cause a new kind of problem.
The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as enhanced by [RFC1123], Section 2.1.
(Curiously, I went to RFC 1035 Section 3.5 by accident, which talks about in-addr.arpa. It's not that.)
Good point, but those requirements have no bearing on user agents. The sections are quite explicit about that.
Step 8 of the host parsing algorithm in the WhatWG spec has a very simple rule for distinguish IP addresses. It says "If asciiDomain ends in a number, then return the result of IPv4 parsing asciiDomain." We should either reference that or (given how simple it is) copy the "ends in a number" bit.
I believe that /\.(?:[0-9]+|0x[0-9a-f]+)$/ covers it for those who like to complicate their existing problems.
It does seem worth explicitly clarifying how IPv6 interacts here as well. (ie, is it ever allowed?)
If we explicitly support legacy IPv4 do we also need to support IPv6 somehow per https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ ?
Yeah, I think that maybe IPv6 is permitted, but I can't be sure. IPv6 would be identified by the leading '['