http-extensions icon indicating copy to clipboard operation
http-extensions copied to clipboard

Parser for Domain attributes

Open annevk opened this issue 3 years ago • 11 comments

If a resource on 127.0.0.1 sets a cookie with a Domain attribute set to 127.1, does that work? (Spoiler: it works if it was passed through the host parser, which seems like a prerequisite to determine whether it's an IP address or not as needed by Domain Matching.)

(I raised this in #1707 but it seems that's exclusively used for the non-ASCII bytes discussion.)

annevk avatar Feb 07 '22 10:02 annevk

Good question, what should happen?

Section 5.5 Step 9, sub-step 1 states

 9.   If the domain-attribute is non-empty:
        1.  If the canonicalized request-host does not domain-match the
            domain-attribute:
               1.  Ignore the cookie entirely and abort these steps.

It looks like "domain-match" is doing the heavy lifting in that alogithm.

Section 5.1.3 Domain Matching has the following algorithm

A string domain-matches a given domain string if at least one of the
   following conditions hold:

   *  The domain string and the string are identical.  (Note that both
      the domain string and the string will have been canonicalized to
      lower case at this point.)

   *  All of the following conditions hold:

      -  The domain string is a suffix of the string.

      -  The last character of the string that is not included in the
         domain string is a %x2E (".") character.

      -  The string is a host name (i.e., not an IP address).

Because this is an IP address, only the top condition is applicable meaning that since a cookie with a Domain=127.1 wouldn't be identical to the string (the request-host) the implementor should ignore the cookie entirely.

That's straightforward enough, assuming we agree on my interpretation here.

does that work?

Also a good and (unfortunately) different question.

Well for Chrome 98 it seems we do set the cookie, oops.

This feels like it's an implementation bug, as the spec's instructions seem reasonable, but do you have a different take Anne?

sbingler avatar Feb 07 '22 16:02 sbingler

I think using string equality with the Domain attribute value string (post dot removal and lowercasing) is fine. It's a bit stricter, doesn't perpetuate the weird spelling of IP addresses, and is less likely to run into the kind of problems people were concerned with around UTF-8 bytes.

I was concerned with how Domain Matching knows something is an IP address, but I suppose that "The string is a host name (i.e., not an IP address)." is not something it needs to know about the string, but about the domain string it presumably has previously parsed and thus knows about. That might be worth clarifying there to avoid giving the impression that the string needs to be parsed.

annevk avatar Feb 07 '22 17:02 annevk

Are you suggesting that - The string is a host name (i.e., not an IP address). should be modified to say - The domain string is a host name (i.e., not an IP address).?

sbingler avatar Feb 07 '22 23:02 sbingler

Maybe that's good enough. Ideally the spec would just reuse the URL Standard data types, but that's probably not attainable.

annevk avatar Feb 08 '22 07:02 annevk

Hmm, that change on its own might cause issues

Let's pretend we modified the algorithm to only look at the domain string

A string domain-matches a given domain string if at least one of the
   following conditions hold:

   *  The domain string and the string are identical.  (Note that both
      the domain string and the string will have been canonicalized to
      lower case at this point.)

   *  All of the following conditions hold:

      -  The domain string is a suffix of the string.

      -  The last character of the string that is not included in the
         domain string is a %x2E (".") character.

      -  The domain string is a host name (i.e., not an IP address).

If a response from https://127.0.0.1 sets a cookie foo=bar; Domain=.0.1 then I think we'd check the domain value 0.1, see it doesn't string match the string 127.0.0.1 and then try the next condition. I'm unclear if 0.1 could be considered a valid host name (rfc1123 section 2.1 implies it might be?), but if it is then we could pass this test and allow the cookie to be set.

Maybe we could get away with a warning that no parsing of the input string should occur aside from checking if it's an IP address. But then we've worked our way back to your original point of "I was concerned with how Domain Matching knows something is an IP address"

sbingler avatar Feb 08 '22 20:02 sbingler

I was under the mistaken impression that string and not domain string was obtained from the cookie Domain attribute. So the algorithm works as-is for IPv4 (in the strict manner, as discussed), except that you need to know out-of-band whether string is an IP address. (That would be helped by using URL Standard types and making string a proper host, but I digress.)


Now having said that, I'm not sure it works as well for IPv6. There's multiple ways to spell an IPv6 address and if you don't canonicalize (by parsing) you essentially have to spell them the way the URL Standard will canonicalize them.

So you cannot write [0::1] in the Domain attribute and expect it to work. You'd have to write [::1]. That seems broken? (Again, this example will likely work in Chrome per your earlier comments.)

annevk avatar Feb 09 '22 08:02 annevk

I don't think that the definition permits IPv6 addresses, so that might be moot. Adding brackets to the grammar might cause a new kind of problem.

The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as enhanced by [RFC1123], Section 2.1.

(Curiously, I went to RFC 1035 Section 3.5 by accident, which talks about in-addr.arpa. It's not that.)

martinthomson avatar Feb 10 '22 03:02 martinthomson

Good point, but those requirements have no bearing on user agents. The sections are quite explicit about that.

annevk avatar Feb 10 '22 07:02 annevk

Step 8 of the host parsing algorithm in the WhatWG spec has a very simple rule for distinguish IP addresses. It says "If asciiDomain ends in a number, then return the result of IPv4 parsing asciiDomain." We should either reference that or (given how simple it is) copy the "ends in a number" bit.

I believe that /\.(?:[0-9]+|0x[0-9a-f]+)$/ covers it for those who like to complicate their existing problems.

martinthomson avatar Jul 28 '22 17:07 martinthomson

It does seem worth explicitly clarifying how IPv6 interacts here as well. (ie, is it ever allowed?)

If we explicitly support legacy IPv4 do we also need to support IPv6 somehow per https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ ?

enygren avatar Jul 28 '22 18:07 enygren

Yeah, I think that maybe IPv6 is permitted, but I can't be sure. IPv6 would be identified by the leading '['

martinthomson avatar Jul 28 '22 18:07 martinthomson