URI.js
URI.js copied to clipboard
Emoji is incorrectly encoded in punycode
new URI("https://🤦♂️.xpaw.me").normalize().hostname()
> "xn--1ug66vku9rd58h.xpaw.me"
Unicode inspector: https://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%A4%A6%E2%80%8D%E2%99%82%EF%B8%8F
However Chrome and https://www.punycoder.com/ encode it as https://xn--g5hz781o.xpaw.me/
What's happening here?
Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as xn--g5hz781o
.
Firefox only drops ZWJ which ends up xn--1ug66v4685b
.
Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.
Unfortunately I have no idea how emojis in domains should behave.
We could try updating punycode to 1.4.1, currently we're using 1.4.0. unfortunately 2.0.0 seems to have dropped legacy browser support.
It basically seems that IDNA rules should be followed before the domain is turned into punycode - https://unicode.org/reports/tr46/
I have a test page on https://xn--g5hz781o.xpaw.me/ which I did to test various browsers.
punycode.js
doesn't seem to implement it sadly:
- https://github.com/mathiasbynens/todo/issues/9
- https://github.com/bestiejs/punycode.js/issues/12
There is https://github.com/jcranmer/idna-uts46 which could probably solve the problem here, but that library is crazy big.
maybe @mathiasbynens has thoughts on this?
Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as
xn--g5hz781o
.Firefox only drops ZWJ which ends up
xn--1ug66v4685b
.Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.
Was there a conclusion regarding whether or not variation selectors should be dropped?
For the record, the latest idnaMappingTable (Unicode v15) seems to say the variation selectors should be ignored/dropped:
FE00..FE0F ; ignored # 3.2 VARIATION SELECTOR-1..VARIATION SELECTOR-16