PyDomainExtractor
PyDomainExtractor copied to clipboard
Ip addresses are parsed incorrectly
How to reproduce:
call extract_from_url
with http://127.0.0.1 as input.
result will be {subdomain: 127.0.0, domain: 1}
expected behavior: throw Invalid Domain Error
Technically this is a valid domain. I'm not sure what to do here. Validating the domain here is weird. Ensuring the domain is not an IP gonna be hard here. I think we should tolerate such cases.
It does not parse IP addresses
Tldextract can do this for you.
Technically this is a valid domain. I'm not sure what to do here. Validating the domain here is weird. Ensuring the domain is not an IP gonna be hard here. I think we should tolerate such cases.
From IETF RFC3696, top-level domain names cannot be all numeric (i.e. In the case of http://127.0.0.1
, 1 is not a TLD, hence 127.0.0.1
cannot be a fully-qualified domain name (FQDN))