tdesktop icon indicating copy to clipboard operation
tdesktop copied to clipboard

Combining diacritics get stripped

Open ralesk opened this issue 6 years ago • 29 comments

Telegram desktop strips (some?) combining diacritics entirely, making it hard to send, for example, complex IPA across.

This is just marginally related to #2651, which was about the text rendering. It may also result in issues when communicating file names from Macs which use decomposed characters at least in the case of accented Latin letters.

Steps to reproduce

  1. Try to type/paste/etc. anything that's a letter + a combining accent, for example: o̿ (o with double overline above)
  2. Send the message
  3. Notice how: a. the message does not seem to contain the diacritic b. upon editing the diacritic is missing

Expected behaviour

The message should not be altered and the result should be o̿.

Actual behaviour

The message is altered and the result is o instead.

Configuration

Operating system: Linux, Fedora 29, MATE desktop Version of Telegram Desktop: 1.7.14

ralesk avatar Aug 06 '19 15:08 ralesk

https://github.com/telegramdesktop/tdesktop/issues/1041

Aokromes avatar Aug 30 '19 15:08 Aokromes

No, this is not a keyboard input issue. Not related to #1041.

ralesk avatar Sep 02 '19 14:09 ralesk

"When I normally write in all the rest of the programs, if I hit ' and e, I see é.

However, in Telegram app it appears the e without the tilde."

Aokromes avatar Sep 02 '19 19:09 Aokromes

That issue is about keyboard input, and in particular compose key (the X11 way of having multiple keystrokes result in a single letter) and/or a dead key (another way of having you press a sequence of keys to end up with a single letter) not being honoured by the input widget in Telegram and/or Telegram's Qt.

This issue is about character sequences (as opposed to keypress sequences), where you have literal characters in the paste buffer and Telegram or Telegram's Qt stripping so-called combining characters, which do not appear in the other issue whatsoever.

ralesk avatar Sep 03 '19 11:09 ralesk

So?

ralesk avatar Sep 19 '19 11:09 ralesk

Some combining diacritics get stripped. What, why, how. (Probably a Qt issue?)

o̿wo̿ gets stripped and rendered as o w o — note the space uvͮu doesn't get stripped and is rendered as is

Anyway, let's look at the entire combining range for shits and giggles:

binmode STDOUT, "encoding(utf-8)";
for (0x0300 .. 0x036f) {
   print sprintf("U+%04x", $_)."    a".chr($_)."x    ";
   print "\n" if $_ % 4 == 3;
}

This renders perfectly (as far as the fonts allow) in Discord:

image image

And there are multiple things that happen in Telegram:

  • a + double grave gets replaced by a precomposed character that doesn't exist in the Telegram font and is displayed as a box
  • three combining characters get replaced by a space(?!) once sent, and thus the message is mangled (it is not a display issue, and shouldn't be an input issue, at least as far as pasting goes, everything should be possible to be pasted; definitely not a "keyboard input" issue)

image

Note how these are still good (except for the a + double grave) in the input box before sending... and they're mangled after sending (including when trying to edit again):

image

Here's it with fixed width so it's easier to spot (with fewer spaces):

image

I wonder what is so special about code points U+030A, U+0333 and U+033F that Telegram or Qt mangles them. I wonder if there are any more Unicode characters out there that get this treatment.

ralesk avatar Oct 10 '19 12:10 ralesk

P.S. considering Konsole (a Qt/KDE terminal app) doesn't mess it up, and neither do Clementine or Gwenview, maybe it's not a Qt issue afterall...

ralesk avatar Oct 10 '19 12:10 ralesk

They're not getting stripped, just rendered as a whitespace. You can successfully copy the incorrectly rendered text and paste it in another application, retaining all the "stripped" diacritics.

eternal-sorrow avatar Jan 21 '20 22:01 eternal-sorrow

I have just copied that message to here in this Github entry box and the accent is not present, whereas copying it from Discord (where it doesn't get mangled) works. So no, it's not a display issue, the character is getting replaced by a whitespace.

ralesk avatar Jan 22 '20 13:01 ralesk

Of course since @Aokromes has mistakenly closed it and still hasn't reopened it, it has even less of a chance ever getting noticed, not that anything ever gets noticed here anyway.

ralesk avatar Jan 22 '20 13:01 ralesk

I wonder what is so special about code points U+030A, U+0333 and U+033F that Telegram or Qt mangles them.

These code points present in IsReplacedBySpace method: https://github.com/desktop-app/lib_ui/blob/d4c99701b5210a2db83b1c0f13da1a62f48dfb80/ui/text/text.cpp#L3444-L3457

I found this ticket by great accident

ilya-fedin avatar Nov 05 '20 23:11 ilya-fedin

Thank you! Feels good to be proven right.

ralesk avatar Nov 06 '20 16:11 ralesk

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

stale[bot] avatar May 06 '21 02:05 stale[bot]

The issue is still present.

eternal-sorrow avatar May 06 '21 05:05 eternal-sorrow

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

stale[bot] avatar Nov 02 '21 06:11 stale[bot]

Still having this issue

eternal-sorrow avatar Nov 02 '21 08:11 eternal-sorrow

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

stale[bot] avatar May 02 '22 08:05 stale[bot]

The issue is still there.

eternal-sorrow avatar May 02 '22 10:05 eternal-sorrow

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

github-actions[bot] avatar Oct 30 '22 02:10 github-actions[bot]

Nothing changed.

eternal-sorrow avatar Oct 30 '22 02:10 eternal-sorrow

My favourite bit about this — besides that automatic closing of issues shouldn't be a thing — is that git blame just says "initial commit" and nobody knows why on Earth those codepoints are even in that list of bad codepoints. That function makes so little sense...

ralesk avatar Nov 02 '22 12:11 ralesk

@ralesk some of those functions are to ensure the custom widgets won't render incorrectly due to some nasty character, some of them are to replace characters like server does so tdesktop has valid offsets without re-downloading the sent message. It's unlikely those replacements will ever be revisited given that everyone is afraid to touch that place of tdesktop code (chance of big regressions is too high). You can treat this issue as an architectural one that will likely present all the tdesktop life time.

ilya-fedin avatar Nov 02 '22 14:11 ilya-fedin

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

github-actions[bot] avatar May 02 '23 01:05 github-actions[bot]

+

Neurotoxin001 avatar May 08 '23 10:05 Neurotoxin001

@ilya-fedin I don't think #8140 is related; diacritics aren't getting stripped there, just badly displayed by Qt (and/or the font).

ralesk avatar May 15 '23 11:05 ralesk

I remember I checked the codepoints between the characters and it was using the ones that are in the lib_ui blacklist

ilya-fedin avatar May 15 '23 11:05 ilya-fedin

Hey there!

This issue was inactive for a long time and will be automatically closed in 30 days if there isn't any further activity. We therefore assume that the user has lost interest or resolved the problem on their own.

Don't worry though; if this is an error, let us know with a comment and we'll be happy to reopen the issue.

Thanks!

github-actions[bot] avatar Nov 12 '23 01:11 github-actions[bot]

It looks like server is stripping them, I can't send them from Android phone and see them from another Android phone in the received message.

'Te̊st' 'Te̳st' 'Te̿st'

john-preston avatar May 28 '24 07:05 john-preston

Screenshot_20240528_113832_Telegram

john-preston avatar May 28 '24 07:05 john-preston