None of the regexes match emoji, and only emoji
A regex that matches emoji would be a really useful thing to have in the JS ecosystem! Unfortunately, between Emojibase and emoji-regex, I still haven't seen a package that actually does this. In the case of Emojibase:
emojibase-regexmatches some textual characters such as '↔'.emojibase-regex/emojidoesn't match emoji without U+FE0F, such as '✨'.emojibase-regex/emoji-loosematches some textual characters without U+FE0E, such as '↔'.- And the rest of the provided regexes are obviously not intended to be used for matching emoji.
What's missing is a regex that matches exactly those character sequences that are presented to users as emoji. Some characters are defined in Unicode to default to emoji presentation (see the Emoji_Presentation section), while others require U+FE0F to change their presentation mode. A correct implementation would account for both of these facts, and use a negative lookahead to avoid matching characters with U+FE0E.
I'll be honest, it's been so long since I've worked on this emoji stuff that I've forgotten a lot of how they work. I always have to re-learn the codebase each time I update it. So I'm sure there's bugs everywhere.
With that said, I am tinkering with the regex's here: https://github.com/milesj/emojibase/pull/175
So after looking at this post and the code again, this assumption is correct in how it works. It's by design.
emojibase-regexmatches some textual characters such as '↔'.emojibase-regex/emojidoesn't match emoji without U+FE0F, such as '✨'.emojibase-regex/emoji-loosematches some textual characters without U+FE0E, such as '↔'.- And the rest of the provided regexes are obviously not intended to be used for matching emoji.
I also use regexgen (https://github.com/devongovett/regexgen) to generate the regex pattern, and it does not support negative lookaheads. I'm not aware of another library to handle this and I'm definitely not going to write it from scratch.
There is a regex using unicode properties, but I haven't tested it in years: https://emojibase.dev/docs/regex#unicode-property-support
Been thinking about this more, and I think we could solve this by using functions, like isEmojiPresentation and isTextPresentation, instead of relying purely on RegExp instances. With functions we could run the necessary checks to ensure it's exactly what you want.
Re: the Unicode properties approach, I was happy to discover that the new RegExp v mode makes writing an emoji regex by hand pretty easy, and this is what I've ended up going for.
/\p{RGI_Emoji}(?!\uFE0E)(?:(?<!\uFE0F)\uFE0F)?/v
All major browsers support it, though only as of late 2023. You can get a version that kinda sorta works while only using u mode if you replace \p{RGI_Emoji} with this regex, but it's not going to do well with flags and ZWJ sequences unless you teach it exactly what the valid sequences are.
Nice, good to know! Been waiting years for all those to become available.