terminal
terminal copied to clipboard
Special Unicode glyphs
What are the Unicode codepoints of these 2 weird looking glyphs (Both halves of the Arabic lam alif ligature)? And also, do these glyphs have a stylistic set?
It would be helpful if you could copy/paste them as actual text here, rather than a screenshot 🙂
The text in the screenshot says لآ لإ لأ لا.
I think these glyphs are rendered in some specific stylistic set, right?
@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the ل glyph cluster. It's supposed to overlap like this (a screenshot from Chromium):
If you compare this with Windows Terminal, you can see where the rendering is coming from:
This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust unicode-rs crate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.
As such I'll consider this a bug in our implementation. Let's see if I can do something about that...
Hmm this is not possible to fix right now. This is because Lam-Alef ligatures like this consist of 2 graphemes that occupy 1 column. If you backspace over such a ligature the cursor would (should?) currently move by 0 columns in the terminal, which is wholly unexpected.
Does this mean that the cursor should sometimes move by 2 grapheme clusters? But if we do that, how should e.g. "cooked read" in CMD behave? If you press left/right arrow in front of this ligature, should the cursor move by 1 grapheme and visually not move at all or always move by 2 graphemes?
Edit: The conclusion of our discussion is that in the context of terminals, grapheme clusters have a min. width of 1. As such, the Lam-Alef ligature should form 1 "grapheme cluster" from our POV.
@DHowett can correct me on that, but this appears to be the Cascadia Code style for Arabic. The problem is that we assign 1 cell for each grapheme in the
لglyph cluster. It's supposed to overlap like this (a screenshot from Chromium):
If you compare this with Windows Terminal, you can see where the rendering is coming from:
This is a difficult problem because there's no Unicode specification yet for how to handle widths of Arabic clusters. I checked the Rust
unicode-rscrate (maintained by Manishearth who works on i18n at Google (= way more knowledgeable than us, I'm sure)), and found that it assigns 1 column to the glyph cluster.As such I'll consider this a bug in our implementation. Let's see if I can do something about that...
Oh yeah... You're right