element-web
element-web copied to clipboard
"regexp is fair" turned into "regešair"
Steps to reproduce
- Enable "Automatically replace plain text Emoji" preference
- Select the composition text box
- Type "regexp is fair"
Outcome
What did you expect?
"regexp is fair" or at least "regeš is fair" but arguably it shouldn't replace when there are letters before the matching "plaintext emoji".
What happened instead?
"regešair"
Operating system
elementary OS
Browser information
Firefox Nightly 95.0a1
URL for webapp
https://one.humanoids.be/apps/riotchat/riot/
Homeserver
humanoids.be
Will you send logs?
No
From the cases linked to this issue, a simple check like the one suggested here would be good: if there are characters before the presumed emoji, don't convert it.
Six (and a half) months and nothing...
Clearly a bug that is being neglected for ethnocentric reasons. No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.
it also affects the element developers like myself, however on the larger scale of important bugs to fix it's just not at the same level as app reliability and performance, sorry :(
the issue is flagged as Help Wanted to attract contributions, however.
No doubt it would already be fixed if it affected common suffixes in the most used languages among Matrix/Element owners and users.
Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.
Our triage is based on occurrence. If it only affects few users then yes it'll get triaged lower. But great news, the project is open source, you or anyone can fix this issue regardless of its triage.
Occurrence must not be a factor when it comes two types of issues: security, and data loss.
This bug is a form of data loss.
Hello, I confirm that this issue is still present in desktop and browser application.
General steps to reproduce
- Enable "Automatically replace plain text Emoji" preference
- Select the composition text box
- Type: any character (at least 1)
- Type: a text transformable to Emoji (eg. xp or :3)
- Type: 1 space, flowed by at least 5 characters (alphanumeric or specials)
This will result in:
- the text emoji from (4) will be replaced by an emoji
- the next space + all the first 4 characters following it will be removed, the 5th and above will remain unchanged
eg: "winxp 12345" => "winš5"
@turt2live; My first guess is adding some regex validations to the function transforming Text to Emoji will fix this issue. And I'll be happy to work on that, if you can tell me form where i can start, or where that function is located.
Don't use regexes to replace emoji. There be dragons.
it will not be to replace emojis, but to determine when/if a text need to be transformed to emoji or not, then call the current replaceTextWithEmoji
function, with the correct input. and/or keep the removable 5 characters (after each replace), then concatenate them back.
Anyways, I didn't check the code base yet. So I don't know about this feature implementation, or dragons existence; they may be tamable than expected ^^
This is happening because of below corner case
we look at last 8 characters to check if there is an emoticon
so in case of string regexp 12345
everything before xp gets trimmed and when matching remaining string xp 12345
it looks like xp is an emoticon so it is replaced.
I would Love to submit my first PR on element on this issue, can someone please assign this issue to me.
@adarsh-sgh Thank you for your interest in this issue. We will normally not assign the issue to an external contributor until they have provided at least a draft PR which is taking the right direction. For further details, see our guidance on assigning issues.
If you have any technical questions about this issue, you can ask for help in #element-dev:matrix.org
Now please put a (bigger) task on your storyboard to remove regexes to identify emoticons. Problems like this is one of the many reasons why a tokeniser is a way more reliable way to parse a string, than a regex.
Or did you think Markdown does its thing by applying a million regexes? Of course not š