terminal icon indicating copy to clipboard operation
terminal copied to clipboard

Some Unicode characters(U+2714) displayed incorrectly

Open brainos233 opened this issue 3 years ago • 10 comments

Windows Terminal version

1.12.10393.0

Windows build number

10.0.19044.0

Other Software

PowerShell 7.2.1 wsl.exe

Steps to reproduce

Using PowerShell 7.2.1 to print some Unicode characters like "✔" and the size is too small

Expected Behavior

In Fluent Terminal image

Actual Behavior

In WT image

brainos233 avatar Feb 20 '22 07:02 brainos233

The experimental text rendering engine does a better job:

wt_experimental

and w/out:

wt_unicode_test

elsaco avatar Feb 21 '22 18:02 elsaco

Note that U+2714 is defined in Unicode as a neutral character, which is meant to occupy a single column, while U+274C is defined as a wide character, and is meant to occupy two columns.

As a point of reference, here's what those characters look like in a bunch of different terminals. My test case is: printf "+-+--+\n|\u2714|\u274c|\n+-+--+\n"

image

From left to right, top to bottom: XTerm, Gnome Terminal, MLTerm, WezTerm, Contour, St, Alacritty, and Kitty.

Gnome Terminal might be letting U+2714 overlap into the following column to a certain extent, but it's clearly still a single column character. Everyone else seems to be quiet strictly constrained to the one column. So in my opinion, the rendering in Fluent Terminal is incorrect, but it's possible that's deliberate.

That said, the Atlas renderer clearly does a better job of dealing with fonts like this than the DX renderer. However, I don't think we should ever expect it to take up two columns.

j4james avatar Feb 21 '22 20:02 j4james

If you look closely you can see how the glyph is cut off on both sides in the experimental rendering engine ("AtlasEngine"). I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

Not because I want to make the size of Emojis/symbols inconsistent like in the screenshot, but rather because some scripts only have proportional fonts, which require to be downsized to still be readable (most noticeable with Hebrew on a standard Windows setup). Maybe I can come up with something clever to find a good middle ground...

lhecker avatar Feb 23 '22 17:02 lhecker

I believe there's a bug here in how terminal is handling emoji. Many emoji also exist as characters:

Emoji Text Emoji
Check ✔︎ ✔️
Email 📧︎ 📧️
Penguin 🐧︎ 🐧️
Stopwatch ⏱︎ ⏱️
Heart ❤︎ ❤️

Anyway ... there's a lot.

The point is: Unicode has variation selectors, and they were made part of the emoji spec in 2018. Bottom line: fe0e selects text mode, and fe0f selects emoji mode and when the selector is missing, it's up to the software to choose which way to render it.

Here's the current list of all the variation sequences for 14.0.0.

I believe that Windows Terminal is doing the wrong thing here.

  • First: it ignores the value of the selector -- it always renders emoji.
  • Second: when there is no selector, it renders the emoji smaller. Frequently, so tiny you can't see what they are.

It appears to me that terminal is trying to render the emoji in a single character space whenever there's no variation selector. I think this should not be done -- but if you were going to do it, surely you should only do it when it's the "text" variation, and not when the variation is unspecified?!

I, for one, can live with always rendering emoji --as far as I know, no fixed-width font has characters for these anyway-- but making them small when the selector is missing is absolutely a bug.

For what it's worth, a concrete example. Here's what happens in the terminal: image

When I copy/paste that from Terminal into github, Github renders it properly. The first is rendered as a character. The second is an emoji. The third one is rendered as a character in code blocks, and an emoji elsewhere (except for the face, hmmmm).

❯ "`u{23f1}`u{fe0e} `u{23f1}`u{fe0f} `u{23f1}"
⏱︎ ⏱️ ⏱
❯ "`u{2714}`u{fe0e} `u{2714}`u{fe0f} `u{2714}"
✔︎ ✔️ ✔
❯ "`u{2764}`u{fe0e} `u{2764}`u{fe0f} `u{2764}"
❤︎ ❤️ ❤
❯ "`u{1F610}`u{fe0e} `u{1F610}`u{fe0f} `u{1F610}"
😐︎ 😐️ 😐

❯ "`u{23f1}`u{fe0e} `u{23f1}`u{fe0f} `u{23f1}" ⏱︎ ⏱️ ⏱ ❯ "`u{2714}`u{fe0e} `u{2714}`u{fe0f} `u{2714}" ✔︎ ✔️ ✔ ❯ "`u{2764}`u{fe0e} `u{2764}`u{fe0f} `u{2764}" ❤︎ ❤️ ❤ ❯ "`u{1F610}`u{fe0e} `u{1F610}`u{fe0f} `u{1F610}" 😐︎ 😐️ 😐

Jaykul avatar Aug 05 '22 01:08 Jaykul

If you look closely you can see how the glyph is cut off on both sides in the experimental rendering engine ("AtlasEngine"). I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

BTW I did this just recently in #13549, which will be part of v1.16.


First: it ignores the value of the selector -- it always renders emoji.

Hmm we're feeding the U+FE0E straight to Direct2D to draw it and I was hoping it would just do the correct thing automatically. I'll check with them whether there's anything we can do differently to fix that.

Second: when there is no selector, it renders the emoji tiny (basically, text size).

I believe this is because of what @j4james said:

Note that U+2714 is defined in Unicode as a neutral character, which is meant to occupy a single column, while U+274C is defined as a wide character, and is meant to occupy two columns.

Additionally, if you notice any weird whitespace gaps around Emojis: That's because of #8000 and I'm currently working on fixing that. 🙂

lhecker avatar Aug 05 '22 02:08 lhecker

First: it ignores the value of the selector -- it always renders emoji.

Hmm we're feeding the U+FE0E straight to Direct2D to draw it and I was hoping it would just do the correct thing automatically. I'll check with them whether there's anything we can do differently to fix that.

I suppose they are. As I noted earlier, the fonts we're all using don't have these characters in them, usually, because the characters don't fit in a single cell -- that is, they're totally indecipherable when scaled down.

I'm concerned about this:

I'll likely submit a PR in the future which will make the AtlasEngine look similar to our current standard renderer by downsizing glyphs larger than number of cells allocated for them (1 cell in case of ✔️).

BTW I did this just recently in https://github.com/microsoft/terminal/pull/13549, which will be part of v1.16.

I'm not sure about that one ... downsizing emoji to fit in a single character cell is not the right move, in my opinion. isn't that going to result in the super-tiny examples above (the third character in the each row of the screenshot)? I'm not clear, guess I'll have to wait and see -- because if the only options are "shrink" it to one character or "stretch" it to two wide, I think emoji are going to end up uniformly hideous 😉

Jaykul avatar Aug 05 '22 02:08 Jaykul

I'm not sure about that one ... downsizing emoji to fit in a single character cell is not the right move, in my opinion. isn't that going to result in the super-tiny examples above (the third character in the each row of the screenshot)? I'm not clear, guess I'll have to wait and see -- because if the only options are "shrink" it to one character or "stretch" it to two wide, I think emoji are going to end up uniformly hideous 😉

The downsizing I implemented is at least somewhat better at this than our current approach.

DxEngine (current): image

AtlasEngine (future): image

We don't want to let glyphs overlap rows for performance and correctness reasons (at the moment, because overlapping glyphs are anything but trivial) and so shrinking large glyphs is an okay-ish and simple solution. For instance this string is about 1.5 rows tall: ⟨င်္က္ကျြွှေို့်ာှီ့ၤဲံ့းႍ⟩

Without scaling: image

With scaling: image

(As explained before the extra whitespace is because we assign at least 1 cell to each code point we process. Since that string has 27 code points it's also 27 cells wide. As part of #8000 this situation will be improved though.)

With scaling in general being ultimately necessary, scaling Emojis is basically just a side effect and not actually intentional. Unrelated to the U+FE0E issue, Emojis like U+2714 (❤) without variant selector are fit into a single cell however, simply because that's exactly what other terminals do too (see #2066). So at least that part won't change any time soon. There's also things like https://github.com/microsoft/terminal/issues/12512#issuecomment-1045302361 which I'll probably try to fix as part of #9999.

lhecker avatar Aug 05 '22 12:08 lhecker

Well, I'll have to see what I can do about making my emojis explicitly use the emoji variant. I have to say though -- it seems that if you're going to force the non-variant to a single cell, you really should force the text variant to a single cell too. That's the one where the author meant for it to be a single text character, right?

Jaykul avatar Aug 08 '22 05:08 Jaykul

That the text variant takes up two cells hearkens back to an ages-old issue in the console: FE0F is allocated its own column, because we've never supported zero-width glyphs, and so the grapheme cluster HEART + VARIATION SELECTOR 15 is allocated two columns. :)

Combine that with DirectWrite's propensity for emoji-fying it even when the variation selector is present, and you get a double-width full-color heart. Bah.

DHowett avatar Aug 08 '22 16:08 DHowett

FYI, the issue of variant selectors was discussed at length in #8970. It's a more complicated problem than just supporting zero-width glyphs, because you're changing the width of the character after it has already been output, and original width can affect the layout of the page in ways that are irreversible. Last I checked, most terminals just ignored them (at least regarding the width change), which seems the most sensible thing to do.

j4james avatar Aug 09 '22 08:08 j4james