URI.js icon indicating copy to clipboard operation
URI.js copied to clipboard

Incorrect conversion of IDNA hostname

Open hsalkaline opened this issue 10 years ago • 5 comments

URI.js punicode support works incorrectly.

For example: URI('http://www.Äffchen.com/').normalizeHostname().hostname() == "www.xn--ffchen-vna.com"

The correct conversion (as in browser) is following: var a = document.createElement('a'); a.href = 'http://www.Äffchen.com/'; a.hostname == "www.xn--ffchen-9ta.com"

hsalkaline avatar Jul 08 '14 10:07 hsalkaline

Why would the correct conversion be www.xn--ffchen-9ta.com? phlyLabs Punycode also outputs www.xn--ffchen-vna.com. I don't think the lower case conversion is necessary.

ooxi avatar Jul 08 '14 13:07 ooxi

I wonder what @mathiasbynens thinks about this

rodneyrehm avatar Jul 08 '14 14:07 rodneyrehm

After some search i found, that (please, correct me, if i misunderstood smth):

  • www.xn--ffchen-9ta.com is a correct conversion according to IDNA-2003
  • www.xn--ffchen-vna.com is a correct conversion according to IDNA-2008
  • IDNA-2008 is not fully backward compatible with IDNA-2003
  • punicode.js, bundled with URI.js, implements IDNA-2008

Does punicode.js allow to choose the way the domain would be converted? And if it allow, shouldn't URI.js support this in API?

hsalkaline avatar Jul 08 '14 14:07 hsalkaline

See https://github.com/mathiasbynens/todo/issues/9. This is not something that belongs in Punycode.js as it’s not part of Punycode. It’s part of the preprocessing that happens before the domain name is Punycoded.

As per @annevk’s http://annevankesteren.nl/2014/06/url-unicode, http://unicode.org/reports/tr46/ should be used. It’s compatible with IDNA2003, but uses IDNA2008 data.

mathiasbynens avatar Jul 08 '14 14:07 mathiasbynens

Note that TR46 should be used with the settings noted in http://url.spec.whatwg.org/ We want a particular flavor of TR46.

annevk avatar Jul 11 '14 09:07 annevk