emoji-tree
emoji-tree copied to clipboard
Errors in regexes
Hi,
I was looking for some tools to work with emoji when I found this repo. Unless I'm mistaken, there are two errors in the regexes in lib/emojiRegex.js:
-
According to Wilipedia, "The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039." Hence, your
keycaparray (line 5) should read:const keycap = '[\\u0023\\u002a\\u0030-\\u0039]\\ufe0f?\\u20e3'; -
Your
enclosedIdeographicSupplementarray (lines 27-33) is malformed, there a 3 superfluous opening brackets lines 28, 31, 32. It should read:
const enclosedIdeographicSupplement = [
'\\ud83c[\\ude01-\\ude02]',
'\\ud83c\\ude1a',
'\\ud83c\\ude2f',
'\\ud83c[\\ude32-\\ude3a]',
'\\ud83c[\\ude50-\\ude51]',
];
Maybe I should fork the repo and submit a PR but I'm too lazy for that right now :D
Anyway, thanks for the lib... and congrats for the excellent article on Medium! ;)
Hi @Septh! Thanks for contributing. I also came here through Medium article and noticed this issue hanging for too long. Let me judge this issue being as objective as I can, hoping to help to close this issue soon.
I did some research; your first point concerns alleged false positives, such as \u0024\u20e3. Current regex is checking "blanket-style" \\u0023-\\u0039. However, combos like \u0024\u20e3 would produce a broken character anyway,
Now, current regex would be incorrect if such characters existed, but they don't, so this is not an "error" per se as title of this issue states. Speaking strictly, it's more of an inefficiency of the regex algorithm.
However, Computer Science is a field of a precision so I would vote for the regex to be updated, as per @Septh example.
Second point looks legit to me, indeed there are rogue brackets.
@Septh please find inner strength and issue a PR. It's now getting weird, we have a limbo situation - you provided a correct recipe but there's no way to attribute it to you unless you issue a PR. What do you think? Do a quick fork...
Sorry I missed this! Please feel free to open a PR, otherwise I'll try and get this updated this week