util/dnsname: labels are allowed to start with a digit
What is the issue?
DNS labels are permitted to start with a digit in dnsname.ValidLabel. RFC 1035 section 2.3.1 says that labels must start with a letter.
Steps to reproduce
Add a subtest for TestValidHostname with a label that starts with a digit.
Are there any recent changes that introduced the issue?
No response
OS
No response
OS version
No response
Tailscale version
No response
Other software
No response
Bug report
No response
The history of domain name validation is long and sordid, and tldr it's all a mess and you have to be very careful if you decide to start rejecting labels.
The original spec of DNS says there are no invalid bytes in a label, all bytes are valid.
Later on, hostnames were specified to follow the LDH rule: ascii letters, numbers and hyphens only, case insensitive matching, and labels cannot have leading hyphens. Note however, leading digits are permitted, the only constraint is that top-level domains cannot consist entirely of digits, to disambiguate them from IP addresses.
Even later, SRV records were introduced in which labels can contain underscores. This makes them invalid hostnames, but valid according to the original DNS spec which has no opinion on the contents of labels.
Then there's IDNA, which reserves a bunch of other names for punycoding, and complicates the question of valid labels even more by distinguishing "valid for registration", "valid for lookup" and "valid on the wire". If you want to get even deeper, ICANN also publishes a list of forbidden labels for registration, which defines yet another subset of labels that are only valid in certain contexts (e.g. all registries must reject registrations for a registerable label redcross, as part of enforcing the addtional protocols of the Geneva Conventions).
After digging quite deep, my conclusion was: in Tailscale, we care about DNS names primarily in quad-100, which is a DNS forwarder. It's close enough to the wire that it should embrace something that's closer to the original "anything goes" definition of valid labels, or we need to do exhaustive research into a dozen different DNS RFCs and nail down precisely what is allowed in the contexts we care about, and enforce it precisely with no mistakes.
Precise enforcement would include teaching this library about IDNA A-labels and U-labels at least, and accounting for SRV-type records, and the old LDH rule (which IDNA modifies so really it's IDNA again), and IP address disambiguation, and also whatever browsers choose to do because in practice that's what people will consider broken if it doesn't work.
But either way, DNS labels can definitely have leading digits :)
I should add: arguably the dnsname library right now is not right either, in that it's still applying LDH-like validation to labels, albeit relaxed to support at least _srv SRV style names as well (I know because I broke clients years ago by disallowing it and had to roll it back, oops). At a glance I think it's properly enforcing the state of the world as it was between SRV records coming into existence, and IDNA rationalizing DNS for arbitrary writing systems.
In practice that's worked well enough given people aren't complaining, but it's probably not capital-C Correct either. But to implement it correctly would require effectively going off and reading all DNS RFCs that have been written and synthesizing a description of what labels are valid in the contexts we care about. I would love to go do that, I already know far too much about it from hacking on Unicode and the public suffix list, but it's one of those things that's never quite the current priority :(
As discussed in the issue, starting with a digit is fine, instead we're updating documentation to be clear about what should / should not be allowed.