markdown-it-hashtag
markdown-it-hashtag copied to clipboard
Support for non-English characters
In JavaScript regular expressions, \w only matches [A-Za-z0-9_].
So it doesn't work well if we put any non-English characters in the tag, like #测试 or #テスト.
Perhaps \w+ should be replaced by something like (?:\w|[^\u0000-\u007F])+ or [^\u0000-\u0029\u0040\u005b-\u0060\u007b-\u007f], as suggested in a StackOverflow Answer?
You are already able to set the accepted characters yourself. See https://github.com/svbergerem/markdown-it-hashtag#advanced and https://github.com/svbergerem/markdown-it-hashtag/blob/master/test/hashtag.js#L23-L27 for some examples. I'll think about changing the default and keep this issue open until I made my decision.
For reference, unicode has a definition for hashtags here https://unicode.org/reports/tr31/#hashtag_identifiers
It's not easy to read but I think it includes most unicode characters