linkify-it
linkify-it copied to clipboard
Links with "_" in the domain name are not regarded as links
what is the issue?
Links with "_" in the domain name, for eg:
- https://api_stage.dzcode.io
- https://api_stage.dz_code.io
are not regarded as links, which is no true, see : https://stackoverflow.com/a/2183140/8113942
the same goes for fuzzy links, for eg:
- api_stage.dz_code.io
- api_stage.dz_code.io
As far as I've been able to research, api_stage.dzcode.io
is an alias for api-stage.dzcode.io
, and dz_code.io
simply isn't a thing.
Please provide an example of widely used domains with underscores in them.
Underscores in domain names are very rare because:
- not permitted by some RFCs (see discussion here)
- domains with underscores are being phased out from being issued SSL certificates
- you can't register 2nd level domain name with underscores (I've just tried)
Linkify-it isn't meant to find every single link (which is impossible), so we have to restrict ourselves to the most common cases. I'm not sure if domains with underscores are worth supporting, especially given false-positive potential of them being introduced in fuzzy links.
Is it possible we get this resolved already? It seems like we are discussing whether this is a valid case or not, but it's obvious that there are cases like this around the web. This library has 100% test coverage, so it's safe to add this change without worrying it would break something. We hear "false-positive potential" mentioned before, but what are the exact cases which could be false-positives?
There is also other option that gets suggested - to use onCompile
to override src_domain
regexp, however, since most of the regexps are dependant on one of another this simple change needs to be applied like this:
LinkifyIt.prototype.onCompile = function onCompile() {
const re = this.re;
const text_separators = '[><\uff5c]';
re.src_domain =
'(?:' +
re.src_xn +
'|' +
'(?:' + re.src_pseudo_letter + ')' +
'|' +
'(?:' + re.src_pseudo_letter + '(?:-|_|' + re.src_pseudo_letter + '){0,61}' + re.src_pseudo_letter + ')' +
')';
re.src_host =
'(?:' +
'(?:(?:(?:' + re.src_domain + ')\\.)*' + re.src_domain/* _root */ + ')' +
')';
re.tpl_host_fuzzy =
'(?:' +
re.src_ip4 +
'|' +
'(?:(?:(?:' + re.src_domain + ')\\.)+(?:%TLDS%))' +
')';
re.src_host_strict =
re.src_host + re.src_host_terminator;
re.tpl_host_fuzzy_strict =
re.tpl_host_fuzzy + re.src_host_terminator;
re.src_host_port_strict =
re.src_host + re.src_port + re.src_host_terminator;
re.tpl_host_port_fuzzy_strict =
re.tpl_host_fuzzy + re.src_port + re.src_host_terminator;
re.tpl_email_fuzzy =
'(^|' + text_separators + '|"|\\(|' + re.src_ZCc + ')' +
'(' + re.src_email_name + '@' + re.tpl_host_fuzzy_strict + ')';
re.tpl_link_fuzzy =
'(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|' + re.src_ZPCc + '))' +
'((?![$+<=>^`|\uff5c])' + re.tpl_host_port_fuzzy_strict + re.src_path + ')';
re.tpl_link_no_ip_fuzzy =
'(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|' + re.src_ZPCc + '))' +
'((?![$+<=>^`|\uff5c])' + re.tpl_host_port_no_ip_fuzzy_strict + re.src_path + ')';
};
I don't think that's maintainable on our codebase.
I actually see couple of options here:
- Merge https://github.com/markdown-it/linkify-it/pull/96 which adds test coverage for these cases and fixes the issue.
- Make this library extendable/configurable in a better way, which doesn't include having half of regexps codebase on consumer side, maintaining backwards compatibility.
Please make some kind of decision, as doing nothing and ignoring OS community issues for years is not a valid solution.