markdown-it-hashtag icon indicating copy to clipboard operation
markdown-it-hashtag copied to clipboard

Support for non-English characters

Open kemege opened this issue 9 years ago • 2 comments

In JavaScript regular expressions, \w only matches [A-Za-z0-9_]. So it doesn't work well if we put any non-English characters in the tag, like #测试 or #テスト.

Perhaps \w+ should be replaced by something like (?:\w|[^\u0000-\u007F])+ or [^\u0000-\u0029\u0040\u005b-\u0060\u007b-\u007f], as suggested in a StackOverflow Answer?

kemege avatar Jul 11 '16 08:07 kemege

You are already able to set the accepted characters yourself. See https://github.com/svbergerem/markdown-it-hashtag#advanced and https://github.com/svbergerem/markdown-it-hashtag/blob/master/test/hashtag.js#L23-L27 for some examples. I'll think about changing the default and keep this issue open until I made my decision.

svbergerem avatar Jul 11 '16 10:07 svbergerem

For reference, unicode has a definition for hashtags here https://unicode.org/reports/tr31/#hashtag_identifiers
It's not easy to read but I think it includes most unicode characters

Powersource avatar Aug 02 '19 12:08 Powersource