url-regex icon indicating copy to clipboard operation
url-regex copied to clipboard

Increasing tlds list accuratness by `(?![a-z])`

Open Kamil93 opened this issue 6 years ago • 5 comments

Thanks to that little piece of regexp code, strings like:

I'am Damian.Gathering is my favorite skill

won't be matched

Kamil93 avatar Oct 24 '19 04:10 Kamil93

It would be great to see this merged. It fixes issue #57, which gives false positives when matching URLs.

schinkowitch avatar Feb 13 '20 13:02 schinkowitch

This cannot be merged, at least on the Node.js side, as is causes RE2 to error with 'invalid perl operator: (?!'. If you're not using RE2, then you are subject to CVE-2020-7661.

Since it can't be merged on the Node side, it would be super inconsistent with Browser usage.

niftylettuce avatar Aug 15 '20 06:08 niftylettuce

See my new package at https://github.com/niftylettuce/url-regex-safe if you want to submit a PR that solves this differently.

niftylettuce avatar Aug 15 '20 08:08 niftylettuce

Do either of you have time to work on implementing an alternative that doesn't use a negative lookahead/behind approach? I would award a bounty, if you submitted a patch to https://github.com/niftylettuce/url-regex-safe. It'd be great as it's parsing out stuff like foo.is from foo.istanbul.

niftylettuce avatar Aug 27 '20 06:08 niftylettuce

I actually found a solution, see https://github.com/spamscanner/spamscanner/commit/0f57896351c778fd06df9a505d033cafab806d25 and the related tests I wrote (and you might want to read the comments as well).

niftylettuce avatar Aug 27 '20 07:08 niftylettuce