psl
psl copied to clipboard
Domain name validation is not correct according to RFC 2181
psl validates the regular expression /^[a-z0-9-]+$/ and returns 'LABEL_INVALID_CHARS' if not valid. This is wrong according to rfc 2181 section 11 Name Syntax:
The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name....Implementations of the DNS protocols must not place any restrictions on the labels that can be used
The validation of LABEL_INVALID_CHARS should be removed
"_" should be allowed
Yup, this thing is not working properly :T
code: "LABEL_INVALID_CHARS" message: "Domain name label can only contain alphanumeric characters or dashes." input: "https://theintercept.com/2019/06/02/samuelpinheiroentrevista/"
🤷🏼♂️
woops my bad, didn't realize I have to remove the protocol
Hi @aviv1ron1, many thanks for reporting this and apologies for the delay in getting back to you... :see_no_evil:
I think you are right with regards to the formal definition of domain name labels as described in the RFC. I think it is my bad that I based the implementation on the description of hostnames on Wikipedia... :cold_sweat:
Hostnames impose restrictions on the characters allowed in the corresponding domain name. A valid hostname is also a valid domain name, but a valid domain name may not necessarily be valid as a hostname. Source: https://en.wikipedia.org/wiki/Domain_name
I would like to correct this, but at the same time I don't want to introduce breaking changes in the short term. I will soon start working on a re-write of this module (v2 - with braking changes), and will probably remove the regex validation when validating domains, and offer a different, and more explicit, mechanism for validating hostnames. I will update this issue as soon as I start work on v2
.
@DaveRingelnatz: considering @aviv1ron1's point, I also think you are right. Please stay tuned for v2, which will hopefully solve these issues.
@ploissken: Please note that you are expected to pass a domain name (ie: theintercept.com
) and not a full URL (like https://theintercept.com/2019/06/02/samuelpinheiroentrevista/
).