URI.js icon indicating copy to clipboard operation
URI.js copied to clipboard

Emoji is incorrectly encoded in punycode

Open xPaw opened this issue 6 years ago • 6 comments

new URI("https://🤦‍♂️.xpaw.me").normalize().hostname()
> "xn--1ug66vku9rd58h.xpaw.me"

Unicode inspector: https://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%A4%A6%E2%80%8D%E2%99%82%EF%B8%8F

However Chrome and https://www.punycoder.com/ encode it as https://xn--g5hz781o.xpaw.me/

What's happening here?

xPaw avatar Apr 21 '18 07:04 xPaw

Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as xn--g5hz781o.

Firefox only drops ZWJ which ends up xn--1ug66v4685b.

Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.

xPaw avatar Apr 21 '18 10:04 xPaw

Unfortunately I have no idea how emojis in domains should behave.

We could try updating punycode to 1.4.1, currently we're using 1.4.0. unfortunately 2.0.0 seems to have dropped legacy browser support.

rodneyrehm avatar Apr 21 '18 13:04 rodneyrehm

It basically seems that IDNA rules should be followed before the domain is turned into punycode - https://unicode.org/reports/tr46/

I have a test page on https://xn--g5hz781o.xpaw.me/ which I did to test various browsers.

punycode.js doesn't seem to implement it sadly:

  • https://github.com/mathiasbynens/todo/issues/9
  • https://github.com/bestiejs/punycode.js/issues/12

There is https://github.com/jcranmer/idna-uts46 which could probably solve the problem here, but that library is crazy big.

xPaw avatar Apr 21 '18 13:04 xPaw

maybe @mathiasbynens has thoughts on this?

rodneyrehm avatar Apr 21 '18 14:04 rodneyrehm

Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as xn--g5hz781o.

Firefox only drops ZWJ which ends up xn--1ug66v4685b.

Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.

Was there a conclusion regarding whether or not variation selectors should be dropped?

n4ru avatar Nov 15 '21 05:11 n4ru

For the record, the latest idnaMappingTable (Unicode v15) seems to say the variation selectors should be ignored/dropped:

FE00..FE0F    ; ignored                                # 3.2  VARIATION SELECTOR-1..VARIATION SELECTOR-16

jarthod avatar Dec 06 '22 15:12 jarthod