linkifyjs URL detection result of find method changed in v3 ?

When using find method to detect URL, I found that the detection results were different in v3 when there were no spaces before and after the URL.

v2

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar
foohttp://example.combar
テストhttp://example.comテスト

v3

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar

// Not URL
foohttp://example.combar
テストhttp://example.comテスト

Is this expected behavior? If so, I would like to see the following fix, even if it’s only for multi-byte characters, because we often write like this in Japanese.

// URL
foo http://example.com bar
foo http://example.combar
foohttp://example.com bar
テストhttp://example.comテスト

// Not URL
foohttp://example.combar

ref: #315

Oct 06 '21 06:10 sunadoi

Hi @sunadoi, thanks for reporting.

The reasons for this regression in v3 are a bit complex related to the extended parsing I added to support Internationalized Domain Names (IDN). The parser now recognizes テスト as words, where in v2 they were treated as unknown symbols. The parser is greedy (tries to identify the longest possible tokens without backtracking) and since there is no delimiting whitespace it treats テストhttp as a word and the rest as an invalid URL.

I believe I can fix this by making a distinction in the parser between ASCII words and non-ASCII words. Unfortunately, because of ambiguity in these types of examples, the best I can get with this plugin will be the following (I used {{}} to mark which portions of text will be identified as links):


foo {{http://example.com}} bar
foo {{http://example.combar}}
foohttp://{{example.com}} bar
テスト{{http://example.comテスト}}

I hope that works for you because I unfortunately I cannot think of a good strategy to cover all edge cases like this.

Oct 07 '21 02:10 nfrasser

@nfrasser

Thank you for your kind explanation. The fix you suggested works for me. I’ll be happy to see it😄

Oct 08 '21 05:10 sunadoi

Fixed in the latest v4 release.

Sep 19 '22 01:09 nfrasser