std.Io.net: make HostName.validate RFC 1123-compliant
The implementation of HostName.validate was too generous. It considered strings like ".example.com", "exa..mple.com", and "-example.com" to be valid hostnames, which is incorrect according to RFC 1123 (the currently accepted standard).
Thanks, all these new unit tests are going to be really handy when porting this logic over to #25592 :slightly_smiling_face:
There are domains with underscores in the wild.
; <<>> DiG 9.18.38 <<>> nomini02_but_w.lpmediastorage.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20951
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;nomini02_but_w.lpmediastorage.com. IN A
;; ANSWER SECTION:
nomini02_but_w.lpmediastorage.com. 300 IN A 185.207.196.137
nomini02_but_w.lpmediastorage.com. 300 IN A 185.207.196.148
;; Query time: 12 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Mon Oct 27 18:04:28 CET 2025
;; MSG SIZE rcvd: 94
There are domains with underscores in the wild.
Based on my understanding of the relevant RFCs, that isn't standards-compliant and would challenge our definition of "valid" without loosening our interpretation.
Some DNS record types also use underscores by convention, like _dmarc.example.com, but we might also consider those outside the scope of a "valid" hostname for the purposes of this function.
In theory only the "hostname" can't contain underscores, everything else can and this includes not only SRV or TXT but also CNAMEs, which can point to A/AAAA records later on.
In practice authoritative DNS servers (knot, nsd, and afaik windows one, whatever it's called) supports underscores, BIND hides it behind an option (at least older versions), resolvers/cache servers don't care at all. Clients support it out of the box without fuss, it's a case of "de facto standard" and going against it will break some user code.
I think if we can define and document the rules for allowing _ characters, that seems reasonable to me. For a function like this that returns a "valid" boolean response (versus a parser function that can be more nuanced), I think we just need to be clear about our definition of "valid".
The existing isValidHostName implementation also doesn't allow _ characters.
I re-implemented this as HostName.validate now that std.Io has landed. The code and tests are the same as the original version. A few notes:
- I'm now using
HostName.max_len, which is 255. My understanding is that 255 is the maximum length of a DNS record, and 253 is the maximum hostname length after you account for the record's length byte and terminator byte. I think 253 is therefore more correct, but we could also discuss that separately because it's used in multiple places. - Now that we have
ValidateError, we could add some additional validation errors likeLabelTooLongorInvalidCharacter, but I'm not sure if callers would benefit from that level of detail. - Per the above discussion, we could also support
_characters, which are valid DNS values but not entirely valid public hostnames. To implement that support, we would replace all of the calls tostd.ascii.isAlphanumericwith a local function that also includes_in the character set.