url
url copied to clipboard
IDNA: avoid defining valid domain string in terms of the parser
This is basically something we need to raise again with the IDNA folks as their document does not really address it. This used to be tracked by https://www.w3.org/Bugs/Public/show_bug.cgi?id=25334.
As part of fixing this we should make it clear they are at least ASCII case-insensitive.
Maybe I'm wrong but aren't valid domains defined in the RFCs below?
- https://www.ietf.org/rfc/rfc1034.txt
- https://www.ietf.org/rfc/rfc1123.txt
The first one saying:
<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
The second one somehow saying that we can start a domain with a figure.
It's not entirely wrong, but those definitions don't account for IDNA and also don't seem to account for ASCII code points that happen to work in practice, such as _. What we want is a definition that does account for that, for which, when the host parser defined in the URL standard is applied to it, the output is not failure.