terminal icon indicating copy to clipboard operation
terminal copied to clipboard

Special Unicode glyphs

Open Pepsiman-12 opened this issue 5 months ago • 5 comments

Screenshot_٢٠٢٥-٠٦-١٠-٢١-٤٥-٠٢-٧٩٠_com.whatsapp~2.jpg

What are the Unicode codepoints of these 2 weird looking glyphs (Both halves of the Arabic lam alif ligature)? And also, do these glyphs have a stylistic set?

Pepsiman-12 avatar Jun 10 '25 18:06 Pepsiman-12

It would be helpful if you could copy/paste them as actual text here, rather than a screenshot 🙂

DHowett avatar Jun 10 '25 19:06 DHowett

The text in the screenshot says لآ لإ لأ لا.

I think these glyphs are rendered in some specific stylistic set, right?

Pepsiman-12 avatar Jun 11 '25 09:06 Pepsiman-12

@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the ل glyph cluster. It's supposed to overlap like this (a screenshot from Chromium):

Image

If you compare this with Windows Terminal, you can see where the rendering is coming from:

Image

This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust unicode-rs crate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.

As such I'll consider this a bug in our implementation. Let's see if I can do something about that...

lhecker avatar Jun 11 '25 13:06 lhecker

Hmm this is not possible to fix right now. This is because Lam-Alef ligatures like this consist of 2 graphemes that occupy 1 column. If you backspace over such a ligature the cursor would (should?) currently move by 0 columns in the terminal, which is wholly unexpected.

Does this mean that the cursor should sometimes move by 2 grapheme clusters? But if we do that, how should e.g. "cooked read" in CMD behave? If you press left/right arrow in front of this ligature, should the cursor move by 1 grapheme and visually not move at all or always move by 2 graphemes?

Edit: The conclusion of our discussion is that in the context of terminals, grapheme clusters have a min. width of 1. As such, the Lam-Alef ligature should form 1 "grapheme cluster" from our POV.

lhecker avatar Jun 11 '25 19:06 lhecker

@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the ل glyph cluster. It's supposed to overlap like this (a screenshot from Chromium):

Image

If you compare this with Windows Terminal, you can see where the rendering is coming from:

Image

This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust unicode-rs crate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.

As such I'll consider this a bug in our implementation. Let's see if I can do something about that...

Oh yeah... You're right

Pepsiman-12 avatar Jun 16 '25 21:06 Pepsiman-12