IDNA Support
Hello! I was hoping to report an inconsistency with how this library handles IDNA, and how the standard(s) state it should be handled, specifically around the allowed use of emojis. The original IDNA2008 specification does not include a specific disallowance of emojis, as Unicode did not include the modern slew of emojis we see now. However, when reading the original specification Section-2.1, it states
These rules identify characters commonly used in mnemonics and often informally described as "language characters". In general, only code points assigned to this category are suitable for use in IDN.
There have been subsequent updates to the standard to rectify this for modern unicode versions as well, namely RFC9233 which disallows Emojis up to Unicode 12, and a newer draft which disallows them up to Unicode 15.
To follow up on this: I did some investigation and found the list swift-url uses is the mapping from UTS#46, which extends IDNA2008. When looking at Table 2b on this page from Unicode, it looks like they provide some markings for compatibility with IDNA2008 in the third field of the mapping txt file. On that same page, it also states
Implementations may be more strict than the default settings for UTS46. In particular, an implementation conformant to IDNA2008 would disallow the input for lines marked with NV8. Implementations need only record that there is an error: they need not reproduce the precise Status codes (after removing any ignored Status values).
Yes, this library implements IDNA as described by the WHATWG URL standard: https://url.spec.whatwg.org/#idna
In particular, see this note:
Note: This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure
That said, I believe the UTS46 processing algorithm (and of course the data tables) have been updated since the last release of this library, so we’re currently out of date and that should be fixed.
If you would like strict IDNA2008 conformance, could I ask about your use case? I’m quite interested to learn who would find that useful if we were to add support somehow.
Yes, this library implements IDNA as described by the WHATWG URL standard: https://url.spec.whatwg.org/#idna
In particular, see this note:
Note: This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure
That said, I believe the UTS46 processing algorithm (and of course the data tables) have been updated since the last release of this library, so we’re currently out of date and that should be fixed.
If you would like strict IDNA2008 conformance, could I ask about your use case? I’m quite interested to learn who would find that useful if we were to add support somehow.
Is there any possibility you would add an argument to respect the NV8 for optional compatibility with IDNA2008?