element-web icon indicating copy to clipboard operation
element-web copied to clipboard

"regexp is fair" turned into "regešŸ˜air"

Open freaktechnik opened this issue 3 years ago ā€¢ 6 comments

Steps to reproduce

  1. Enable "Automatically replace plain text Emoji" preference
  2. Select the composition text box
  3. Type "regexp is fair"

Outcome

What did you expect?

"regexp is fair" or at least "regešŸ˜ is fair" but arguably it shouldn't replace when there are letters before the matching "plaintext emoji".

What happened instead?

"regešŸ˜air"

Operating system

elementary OS

Browser information

Firefox Nightly 95.0a1

URL for webapp

https://one.humanoids.be/apps/riotchat/riot/

Homeserver

humanoids.be

Will you send logs?

No

freaktechnik avatar Oct 22 '21 08:10 freaktechnik

From the cases linked to this issue, a simple check like the one suggested here would be good: if there are characters before the presumed emoji, don't convert it.

turt2live avatar May 11 '22 21:05 turt2live

Six (and a half) months and nothing...

Clearly a bug that is being neglected for ethnocentric reasons. No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.

e-maya avatar Jul 05 '22 16:07 e-maya

it also affects the element developers like myself, however on the larger scale of important bugs to fix it's just not at the same level as app reliability and performance, sorry :(

the issue is flagged as Help Wanted to attract contributions, however.

turt2live avatar Jul 05 '22 16:07 turt2live

No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.

Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.

t3chguy avatar Jul 05 '22 16:07 t3chguy

Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.

Occurrence must not be a factor when it comes two types of issues: security, and data loss.

This bug is a form of data loss.

thany avatar Aug 01 '22 14:08 thany

Hello, I confirm that this issue is still present in desktop and browser application.

General steps to reproduce

  1. Enable "Automatically replace plain text Emoji" preference
  2. Select the composition text box
  3. Type: any character (at least 1)
  4. Type: a text transformable to Emoji (eg. xp or :3)
  5. Type: 1 space, flowed by at least 5 characters (alphanumeric or specials)

This will result in:

  • the text emoji from (4) will be replaced by an emoji
  • the next space + all the first 4 characters following it will be removed, the 5th and above will remain unchanged

eg: "winxp 12345" => "winšŸ˜5"


@turt2live; My first guess is adding some regex validations to the function transforming Text to Emoji will fix this issue. And I'll be happy to work on that, if you can tell me form where i can start, or where that function is located.

4015-al avatar Aug 12 '22 02:08 4015-al

Don't use regexes to replace emoji. There be dragons.

thany avatar Aug 16 '22 11:08 thany

it will not be to replace emojis, but to determine when/if a text need to be transformed to emoji or not, then call the current replaceTextWithEmoji function, with the correct input. and/or keep the removable 5 characters (after each replace), then concatenate them back.

Anyways, I didn't check the code base yet. So I don't know about this feature implementation, or dragons existence; they may be tamable than expected ^^

4015-al avatar Aug 16 '22 20:08 4015-al

This is happening because of below corner case we look at last 8 characters to check if there is an emoticon so in case of string regexp 12345 everything before xp gets trimmed and when matching remaining string xp 12345 it looks like xp is an emoticon so it is replaced.

I would Love to submit my first PR on element on this issue, can someone please assign this issue to me.

adarsh-sgh avatar Feb 06 '23 05:02 adarsh-sgh

@adarsh-sgh Thank you for your interest in this issue. We will normally not assign the issue to an external contributor until they have provided at least a draft PR which is taking the right direction. For further details, see our guidance on assigning issues.

If you have any technical questions about this issue, you can ask for help in #element-dev:matrix.org

t3chguy avatar Feb 06 '23 10:02 t3chguy

Now please put a (bigger) task on your storyboard to remove regexes to identify emoticons. Problems like this is one of the many reasons why a tokeniser is a way more reliable way to parse a string, than a regex.

Or did you think Markdown does its thing by applying a million regexes? Of course not šŸ˜€

thany avatar Feb 10 '23 10:02 thany