element-web "regexp is fair" turned into "rege😝air"

Steps to reproduce

Enable "Automatically replace plain text Emoji" preference
Select the composition text box
Type "regexp is fair"

Outcome

What did you expect?

"regexp is fair" or at least "rege😝 is fair" but arguably it shouldn't replace when there are letters before the matching "plaintext emoji".

What happened instead?

"rege😝air"

Operating system

elementary OS

Browser information

Firefox Nightly 95.0a1

URL for webapp

https://one.humanoids.be/apps/riotchat/riot/

Homeserver

humanoids.be

Will you send logs?

No

Oct 22 '21 08:10 freaktechnik

From the cases linked to this issue, a simple check like the one suggested here would be good: if there are characters before the presumed emoji, don't convert it.

May 11 '22 21:05 turt2live

Six (and a half) months and nothing...

Clearly a bug that is being neglected for ethnocentric reasons. No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.

Jul 05 '22 16:07 e-maya

it also affects the element developers like myself, however on the larger scale of important bugs to fix it's just not at the same level as app reliability and performance, sorry :(

the issue is flagged as Help Wanted to attract contributions, however.

Jul 05 '22 16:07 turt2live

No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.

Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.

Jul 05 '22 16:07 t3chguy

Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.

Occurrence must not be a factor when it comes two types of issues: security, and data loss.

This bug is a form of data loss.

Aug 01 '22 14:08 thany

Hello, I confirm that this issue is still present in desktop and browser application.

General steps to reproduce

Enable "Automatically replace plain text Emoji" preference
Select the composition text box
Type: any character (at least 1)
Type: a text transformable to Emoji (eg. xp or :3)
Type: 1 space, flowed by at least 5 characters (alphanumeric or specials)

This will result in:

the text emoji from (4) will be replaced by an emoji
the next space + all the first 4 characters following it will be removed, the 5th and above will remain unchanged

eg: "winxp 12345" => "win😝5"

@turt2live; My first guess is adding some regex validations to the function transforming Text to Emoji will fix this issue. And I'll be happy to work on that, if you can tell me form where i can start, or where that function is located.

Aug 12 '22 02:08 4015-al

Don't use regexes to replace emoji. There be dragons.

Aug 16 '22 11:08 thany

it will not be to replace emojis, but to determine when/if a text need to be transformed to emoji or not, then call the current replaceTextWithEmoji function, with the correct input. and/or keep the removable 5 characters (after each replace), then concatenate them back.

Anyways, I didn't check the code base yet. So I don't know about this feature implementation, or dragons existence; they may be tamable than expected ^^

Aug 16 '22 20:08 4015-al

This is happening because of below corner case we look at last 8 characters to check if there is an emoticon so in case of string regexp 12345 everything before xp gets trimmed and when matching remaining string xp 12345 it looks like xp is an emoticon so it is replaced.

I would Love to submit my first PR on element on this issue, can someone please assign this issue to me.

Feb 06 '23 05:02 adarsh-sgh

@adarsh-sgh Thank you for your interest in this issue. We will normally not assign the issue to an external contributor until they have provided at least a draft PR which is taking the right direction. For further details, see our guidance on assigning issues.

If you have any technical questions about this issue, you can ask for help in #element-dev:matrix.org

Feb 06 '23 10:02 t3chguy

Now please put a (bigger) task on your storyboard to remove regexes to identify emoticons. Problems like this is one of the many reasons why a tokeniser is a way more reliable way to parse a string, than a regex.

Or did you think Markdown does its thing by applying a million regexes? Of course not 😀

Feb 10 '23 10:02 thany

element-web element-web copied to clipboard

"regexp is fair" turned into "rege😝air"

Steps to reproduce

Outcome

What did you expect?

What happened instead?

Operating system

Browser information

URL for webapp

Homeserver

Will you send logs?

General steps to reproduce

element-web
element-web copied to clipboard