gui icon indicating copy to clipboard operation
gui copied to clipboard

Problems with emoji

Open sorawee opened this issue 1 year ago â€ĸ 12 comments

#lang racket/gui
(define s "🏝🎟")
(define t (new text%))
(define f (new frame% [label ""][width 300] [height 300]))
(define ec (new editor-canvas% [parent f] [editor t]))
(send t insert s)
(send f show #t)

results in two empty spaces. It should show the two emojis.

Weirdly:

#lang racket/gui

(define a "🏝")
(define b "🏝ī¸")
(define c (string (integer->char 127965)))

(define s (string-append a b c "\n"
                         a c b "\n"
                         b a c "\n"
                         b c a "\n"
                         c a b "\n"
                         c b a))

(char->integer (string-ref a 0))
(char->integer (string-ref b 0))
(char->integer (string-ref c 0))

(define t (new text%))
(define f (new frame% [label ""][width 300] [height 300]))
(define ec (new editor-canvas% [parent f] [editor t]))
(send t insert s)
(send f show #t)

shows that a, b, and c have the code 127965, but only b is displayed. a and c are displayed as blank spaces.

sorawee avatar Oct 18 '22 22:10 sorawee

Ah, string->bytes/utf-8 shows:

a, c: #"\360\237\217\235" b: #"\360\237\217\235\357\270\217"

sorawee avatar Oct 18 '22 22:10 sorawee

The issue is that U+1F3DD (for example) by itself is not listed as an emoji in https://unicode.org/Public/emoji/14.0/emoji-sequences.txt. The sequence U+1F3DD U+FE0F is listed there, and that sequence does render as an emoji.

Is the drawing library using a wrong/incomplete definition of "emoji"?

mflatt avatar Oct 18 '22 23:10 mflatt

My understanding based on Googling is that U+FE0F is a variant selector, and a unicode sequence that ends with a variant selector is a variant form. However, a base form (which doesn't have a variant selector) is still perfectly valid.

sorawee avatar Oct 19 '22 00:10 sorawee

https://unicode.org/Public/emoji/14.0/emoji-test.txt has these two lines:

1F3DD FE0F                                             ; fully-qualified     # 🏝ī¸ E0.7 desert island
1F3DD                                                  ; unqualified         # 🏝 E0.7 desert island

sorawee avatar Oct 19 '22 00:10 sorawee

More issues:

Based on https://unicode.org/Public/emoji/14.0/emoji-zwj-sequences.txt

👨‍đŸ‘Ļ should be rendered as appeared here

1F468 200D 1F466                            ; RGI_Emoji_ZWJ_Sequence  ; family: man, boy                                               # E4.0   [1] (👨‍đŸ‘Ļ)

Instead, the drawing library shows one character that occupies two spaces. The first space has an a part of the emoji, and the second space is missing. But the correct rendering should occupy only one space, with two emojis stacked on top of each other.

sorawee avatar Oct 19 '22 00:10 sorawee

https://www.unicode.org/reports/tr51/tr51-21.html is the actual documentation. There, it links to these:

[emoji-data] The associated data files for emoji characters. For the 14.0 versions, see: https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-variation-sequences.txt https://www.unicode.org/Public/emoji/14.0/emoji-sequences.txt https://www.unicode.org/Public/emoji/14.0/emoji-zwj-sequences.txt https://www.unicode.org/Public/emoji/14.0/emoji-test.txt

sorawee avatar Oct 19 '22 01:10 sorawee

The "emoji-test.txt" file also has

00A9 FE0F                                              ; fully-qualified     # Šī¸ E0.6 copyright
00A9                                                   ; unqualified         # Š E0.6 copyright

This kind of example is why the drawing library currently only uses emoji rendering for qualified sequences.

When I open "emoji-test.txt" in programs that render emoji, they show the Š in the comment differently. I'm not sure how they make the choice. The standard seems to say that implementations can pick the rendering of unqualified emoji. Maybe the implementations that I tried pick plain for Latin-1 characters and emoji rendering otherwise? Have you found any other guidance along these lines?

mflatt avatar Oct 19 '22 01:10 mflatt

Just for the data points, my email client (Gmail on Android) renders the copyright symbols the same, but my browser (Firefox on Android) renders the first one larger.

samth avatar Oct 19 '22 01:10 samth

I've pushed a repair for 👨‍đŸ‘Ļ.

Besides unqualified 🏝, I see that there are ZWJ sequences in "emoji-test.txt" that omit U+FE0F in the middle (i.e., minimally qualified). Rendering seems again left up to applications, and most programs handle those as a single emoji, but Emacs 28 draws individual elements. I think the right choice is probably use emoji rendering when U+FE0F is omitted after a non-Latin-1 character.

mflatt avatar Oct 19 '22 02:10 mflatt

I think the right choice is probably use emoji rendering when U+FE0F is omitted after a non-Latin-1 character.

No, that doesn't seem right after all. It uses emoji rendering for U+203C â€ŧ, and other programs don't.

The Emoji_Presentation property sure sounds like the distinction I'm looking for, but U+1F3DD 🏝 doesn't have that property.

mflatt avatar Oct 19 '22 03:10 mflatt

I notice that 1F3DD has Extended Pictographic, but 1F3DD FE0F doesn't. Can this be used for the detection somehow?

I'm not sure how they make the choice.

It appears that there are "text style" and "emoji style". For 00A9 (the copyright symbol), they look different.

  • https://www.emojiall.com/en/code/00A9 (base)
  • https://www.emojiall.com/en/code/00A9-FE0E (text)
  • https://www.emojiall.com/en/code/00A9-FE0F (emoji)

For 1F3DD, however, the text style and emoji style look the same.

Could it be that for unqualified, Firefox tries the "text style" first?

sorawee avatar Oct 19 '22 03:10 sorawee

Properties like Extended_Pictographic are on code points, not sequences, right?

Could it be that for unqualified, Firefox tries the "text style" first?

Yes, I think selection in many applications ends up being based on glyphs available, which is not so easy to do in racket/draw. For now, I may just go with a hack that treats U+1F000 and up as if they had Emoji_Presentation.

mflatt avatar Oct 19 '22 03:10 mflatt

This appears to be fixed. Closing.

sorawee avatar Jul 22 '23 17:07 sorawee