guava icon indicating copy to clipboard operation
guava copied to clipboard

Public suffix list is used as though it were designed to be exhaustive, but it's not

Open gissuebot opened this issue 10 years ago • 5 comments

Original issue created by [email protected] on 2013-12-18 at 05:30 PM


""" The issue appears to be in the API of InternetDomainName.findPublicSuffix() - https://github.com/google/guava/blob/ab29b173055a1ff647516848b176265fc6792ba0/guava/src/com/google/common/net/InternetDomainName.java#L167

The issue appears to be that this class is disregarding Step 2 of "The Algorithm", described at http://publicsuffix.org/list/ - that is, "If no rules match, the prevailing rule is *".

In this model, any domain not on the list is assumed to be registerable at the second level. For example, "au" is not included in the PSL. This should cause "foo.au" to fail to match any rules, and thus fall into the default wildcard rule. In the default wildcard rule, the public suffix is ".au" - and CSIRO is treated as a registerable name.

This is especially important with the many new registries that ICANN is approving; a decision has not been made to automatically add them to the PSL, and so I fear this may cause issues for Java applications in validating these domains.

If the goal is to ensure a name is "valid" (that is, assigned/approved by ICANN), then IANA has a data file that is updated twice daily at http://data.iana.org/TLD/tlds-alpha-by-domain.txt that contains all IANA-assigned gTLDs. It may make sense to incorporate this data into the PSL trie to have a proper "fail open" behaviour.

...

For plausability checks, then the IANA list is a much better resource, for sure. For security checks, the PSL is the best source of data for this.

...

The point of the PSL is not to replace the IANA list but to further reduce scope of registerable labels.

There would be no benefit to the PSL's including the full IANA list, and real performance harm, since step 2 of the algorithm implicitly covers these domains. """

What would change in InternetDomainName? I would want to talk more to the original bug reporter and to others, but here are some guesses:

  • topPrivateDomain() would remain
  • isTopPrivateDomain() would remain (though we might take the opportunity to look at users and see whether it's worth having when it's so easy to roll your own)
  • hasPublicSuffix() would be replaced by hasTld()
  • publicSuffix() could remain, but from a quick survey of users, I get the impression that most either want tld() or could get by with topPrivateDomain() just as easily
  • isPublicSuffix() would be removed... or replaced by isTld(), but I've always been fuzzy on how the original method was to be used, and it's easy to roll your own
  • isUnderPublicSuffix() would be removed... or replaced by isUnderTld(), but that seems to have all the concerns of isPublicSuffix() and more, since a domain can be under a TLD but not a public suffix

gissuebot avatar Oct 31 '14 17:10 gissuebot

Original comment posted by [email protected] on 2013-12-18 at 06:12 PM


(No comment entered for this change.)


CC: [email protected]

gissuebot avatar Nov 01 '14 01:11 gissuebot

We got a report internally (just days after I opened this bug) that the TLD list was changing to include all the IANA Root Zone Database. That seems to be the case, or at least it seems to be close (maybe differing just by lagging a little?):

$ wget http://www.iana.org/domains/root/db ... $ wget http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 ...

$ comm -1 -3 <(sed -e 's#//.##' -e 's/.[.]//' effective_tld_names.dat?raw=1 | sort -u) <(egrep -o '/domains/root/db/\w+.html' db | egrep -o '\w+[.]' | tr -d . | sort) bl bq eh mf movie plus ss tech tickets um

There is also a question of whether proposed names should be accepted. I think that "proposed names" may be those at http://icannwiki.com/All_New_gTLD_Applications (that aren't WITHDRAWN?). But I need to look into this.

cpovirk avatar Mar 25 '15 20:03 cpovirk

Semi-related: If we ever decide to more heavily design the public-suffix support of InternetDomainName, we should glance at the API used by https://github.com/whois-server-list/public-suffix-list

cpovirk avatar Mar 25 '15 20:03 cpovirk

Here's what I've learned:

There are successively more restrictive checks that we could offer for a TLD:

cpovirk avatar Jun 19 '15 17:06 cpovirk

any updates on this issue? Thank you

muhammadismailkhan0009 avatar Feb 19 '23 14:02 muhammadismailkhan0009