IDNA proposal
Mentioning a number of people that I suspect have IDNA domains or knowledge about how IDNA works (based on skimmy bugs and PRs)
@adamus1red @dkim1970 @flz @juliusrickert @killerbees19 @Kusado @louis-lau @masterzen @mderriey @pmoroney @tresni @Yannik
I'd love to receive feedback about this proposal!
@tlimoncelli
I think your example for the asci==unicode exampe might be wrong:
#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400)
Was this what you intended?
I noticed you created both CREATE and MODIFY examples for ascii (unicode) and unicode (ascii), but only MODIFY examples for ascii and unicode. How is that to be understood?
@tlimoncelli
I think your example for the asci==unicode exampe might be wrong:
#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400)Was this what you intended?
Ah, good point!
The first one is where the label is ascii==unicode but the target is ascii!=unicode. I'll update the comment.
Thanks for finding that!
Tom
I noticed you created both CREATE and MODIFY examples for ascii (unicode) and unicode (ascii), but only MODIFY examples for ascii and unicode. How is that to be understood?
I've added more examples. I don't think I've covered every combination, but my goal is to show typical examples not every possible example.
I've also added examples where we use {} and ⟬⟭ and ❮❯. I think using unicode chars to highlight unicode domains would be cool (maybe too clever?).
While I'm not against having the ASCII and UTF on the output lines, I do worry it might make the output too busy.
Wouldn't simply using the .Name value be a better since it should then pretty much match what is in the dnscontrol configuration?
LGTM
@adamus1red wrote:
While I'm not against having the ASCII and UTF on the output lines, I do worry it might make the output too busy. Wouldn't simply using the .Name value be a better since it should then match what is in the dnscontrol configuration?
That's an interesting point! I guess my thought is that showing both versions helps with debugging.
That's an interesting point! I guess my thought is that showing both versions helps with debugging.
@tlimoncelli maybe a compromise would be if the output was the same as what the DNS provider or Registrar used.
I know I've had issues where the DNS is using UTF but the registrar is using ASCII. I.e. namecheap uses ascii, so for registrar stuff using namecheap use ascii punycode and the DNS is cloudflare which uses UTF, so the output uses UTF.
The only IDNA domain I have is for fun, so I don't have a strong preference. I'll give my input nonetheless :). If you want to show both, I think I like B better, as it feels more consistent to me. Anything not in brackets will always be ASCII that way.
I'd probably go with showing what the original user input was, with a flag to only show ASCII if needed. It's less information to parse, and the user should be familiar with it as that's the way it's listed in their config. I could see points being made for showing both, but I've always liked things more distraction free and less dense.
I think the Unicode brackets are a little too clever, perhaps even a little confusing ;)
First of all, improving IDNA handling would be a great improvement to dnscontrol.
Regarding output, the one thing I definitely do not like is having the ascii output come first, because it is the one least likely to be understood/mentally associated with the relevant domain.
I think simply using the original user input has merit, pairing that with a toggle to additionally show ascii seems fine to me.
However, I also wouldn't mind the unicode (ascii) output.
First of all, improving IDNA handling would be a great improvement to dnscontrol.
Regarding output, the one thing I definitely do not like is having the ascii output come first, because it is the one least likely to be understood/mentally associated with the relevant domain.
I think simply using the original user input has merit, pairing that with a toggle to additionally show ascii seems fine to me. However, I also wouldn't mind the
unicode (ascii)output.
I'm seconding this suggestion, by displaying the "human readable" format I think the barrier for using IDN's with dnscontrol is getting lowered.
Because the IDNA format is not human readable, especially when it comes to non-latinized languages.
This is excellent feedback! It's getting me excited!
Question: In what situations would people want to see something besides the .Name (the user input) version?
Question: In what situations would people want to see something besides the .Name (the user input) version?
What about if the registrar or dns provider use something different than the .Name value, then include the version they are using in brackets?
Personally, I think whatever the dns provider does isn't relevant to the cli output. Behind the scenes at every provider, it's all punycode anyway.
Personally, I think whatever the dns provider does isn't relevant to the cli output. Behind the scenes at every provider, it's all punycode anyway.
Agreed, having the display format handled outside of the provider is to be preferred IMO.
I don't have experience with IDNA at all. My $0.02: I do agree that showing the ascii version is useful only for debugging, what users want to see is if their unicode domain (or however it was entered in dnsconfig.js) is being processed and how.
Hi folks!
2 ideas:
Support multiple formats?
There's been a lot of discussion about ascii (unicode) vs unicode (ascii). It might be possible to add a command line flag that selected the format. No promises, but it might be possible. In that case, I'd recommend the default be userinput and add a flag for debugging that shows unicode (ascii) or userinput (ascii).
I'll know more if this is possible when I start coding.
An idea that would break less existing code
Existing code expects .Name to be ASCII (the current code runs dc.Punycode() for all providers, which rewrites .Name to be ASCII). Rather than require every use of .Name to change to .NameASCII, maybe the names should be: .Name (ASCII, to be compatible with old code), .NameORIG (how the user input the string), .NameUNICODE, and .NameDisplay.
Also, IDN isn't IDN if we compare .de and .com. Some TLD Providers support different IDNA Standards (IDNA2003 vs. IDNA2008, UTS46). Translating an IDN might by that end in a different punycode variant.
Let me provide some example in here from the HEXONET Provider's ConvertIDN API Command:
[COMMAND]
COMMAND = ConvertIDN
DOMAIN0 = ärzte.com
DOMAIN1 = ärzte.de
EOF
[RESPONSE]
CODE = 200
DESCRIPTION = Command completed successfully
PROPERTY[ACE][0] = xn--rzte-koa.com
PROPERTY[ACE][1] = xn--rzte-koa.de
PROPERTY[IDN][0] = ärzte.com
PROPERTY[IDN][1] = ärzte.de
EOF
No big difference in here. But let us pick one with german special characters:
[COMMAND]
COMMAND = ConvertIDN
DOMAIN0 = fußball.com
DOMAIN1 = fußball.de
EOF
[RESPONSE]
CODE = 200
DESCRIPTION = Command completed successfully
PROPERTY[ACE][0] = fussball.com
PROPERTY[ACE][1] = xn--fuball-cta.de
PROPERTY[IDN][0] = fussball.com
PROPERTY[IDN][1] = fußball.de
EOF
Let us ignore that .com is covering that differently and let us use the punycode variant returned for .de as a .com domain name xn--fuball-cta.com. While the IDN translation is from technical perspective correct, it won't work together with the TLD Provider as of a different supported IDNA Standard. By that, I highly think that this needs to be considered as well when going for a IDNA proposal. The above API Command is mapping the response to a working variant.
Mhmm... DNSControl again runs on "existing" data configured by the user. By that, the input should be considered as "correct" (would be very stupid otherwise) and by that, we can consider this special discussion probably as superfluous... Or should DNSControl then exit with an error in case a potential IDN Precheck fails?
Mhmm 2 ... The DNS/Domain Provider should finally be capable of handling that on their own (returning an error message static out that the provided domain/dnszone name is invalid) and you guys do not have to worry about all that. Sorry that I bumped this up :-)
Thanks for all the feedback!
After writing a bunch of code to implement this for zone names (but not yet the labels of individual records), it's quite clear that NameDisplay is not needed. There's only one place that uses it, and we can just generate the right format at that time.
DomainConfig now stores:
.Name: I would call this .NameASCII as recommended, but that would break a lot of code. Maybe some day we'll have "a great renaming"?.NameRaw: The domain name as the user input it in dnsconfig.js, (passed through ToLower).NameUnicode: The domain passed through strings.ToLower then idna.ToUnicode().
I don't know if the way I call ToLower will break things. Actually, now that I write this I think it will. Maybe I should do this instead:
.Name: call idna.ToASCII() then strings.ToLower().NameRaw: The domain name as the user input it in dnsconfig.js, with no changes (so far no code uses this. Maybe that's a sign?).NameUnicode: call idna.ToASCII() then strings.ToLower() then idna.ToUnicode()
https://github.com/StackExchange/dnscontrol/pull/3879 fixes the problems and, thanks for all of your suggestions, just works a lot better.