toxic icon indicating copy to clipboard operation
toxic copied to clipboard

Poor handling of UTF-8 characters

Open kseistrup opened this issue 10 years ago • 8 comments

Toxic seems so be unable to handle certain UTF-8 characters — specifically those made of 4 bytes.

While I can enter a 3-byte UTF-8 character like the horizontal ellipsis (U+2026 HORIZONTAL ELLIPSIS):

/status away "zZzZ…"

I cannot enter a 4-byte character like ‘😴’ (U+1F634 SLEEPING FACE), it simply doesn't appear after the first double quotation mark.

/status away "

It is also impossible to paste said character onto toxic's commandline.

The terminal in which toxic runs shows both characters as expected.

kseistrup avatar Apr 15 '15 15:04 kseistrup

This is an issue with ncurses unicode support. It uses an old standard which doesn't support certain characters such as the new emojis.

JFreegman avatar Apr 15 '15 20:04 JFreegman

On Wed 15 Apr 2015 13:51 -0700, JFreegman wrote:

This is an issue with ncurses unicode support. It uses an old standard which doesn't support certain characters such as the new emojis.

This is not entirely true. With other ncurses programs you can still input the characters, but they are only displayed with a non-descript rectangle, as far as I've experienced (bash, vim, etc).

(zsh displays the character as "<0001f634>" with reverse colour)

I'm curious how kseistrup is able to display the characters in the terminal correctly though. Maybe I haven't been able to see them because of a bad font set-up.

louipc avatar Apr 16 '15 00:04 louipc

I'm using “Droid Sans Mono Regular” as font in the terminal emulator.

kseistrup avatar Apr 16 '15 15:04 kseistrup

On Thu 16 Apr 2015 08:46 -0700, Klaus Alexander Seistrup wrote:

I'm using “Droid Sans Mono Regular” as font in the terminal emulator.

I actually tried a vte-based terminal and was able to see the character correctly in some ncurses based programs. Some display it and allow its input and some don't though, so I wonder if they are doing something extra.

In any case I think this is a feature that we want in toxic.

louipc avatar Apr 16 '15 16:04 louipc

I agree, toxic ought to be able to handle the full range of UTF-8 chars.

kseistrup avatar Apr 16 '15 16:04 kseistrup

To extend this, if someone sends me a 4-byte UTF-8 character (e.g., "🐼"— the panda face), I can see it, but I cannot type it to send to my contacts.

While it is true that ncurses struggles with these things (mainly because of out-of-date wc(s)width() functions), you can get around it by using LD_PRELOAD to load a library with more modern versions. However, even with that workaround (which is why I can see the ones my contacts send me), I still cannot type them with xcompose.

HalosGhost avatar Jun 22 '15 00:06 HalosGhost

是啊,中文就是乱码呢

xoomp avatar Mar 21 '19 10:03 xoomp

是啊,中文就是乱码呢

Yes, Chinese is garbled.

kseistrup avatar Mar 21 '19 11:03 kseistrup

Is this still an issue? I can use those emojis and Chinese characters just fine on st

Pigpog avatar Mar 04 '23 20:03 Pigpog

I'm not sure. It's been quite a while since I used toxic. I'll see if I have a moment to test in the next week. Though, I also wouldn't be offended if this is closed as out-of-date.

HalosGhost avatar Mar 05 '23 21:03 HalosGhost

Buggy unicode support is a well-known issue and will probably never be fixed unless someone else wants to put in the time. I personally don't have time to mess around with it for longer than I have already.

JFreegman avatar Jan 25 '24 21:01 JFreegman