FarManager icon indicating copy to clipboard operation
FarManager copied to clipboard

alignment problem when filenames have unicode grapheme clusters.

Open tbali0524 opened this issue 10 months ago • 12 comments

Far Manager version

3.0.6364

OS version

10.0.22631.4751

Other software

No response

Steps to reproduce

The main file list panel has some alignment issues, the calculation of the width of the filename seems to be wrong. Screenshot: Image Same issue comes up with modal windows with same filenames, screenshot Image Example filename: Üzletágvezetői prémium szabályzat_Belényi Norman_2025 Q1_v1.docx

The hungarian unicode characters ÜűÁáőéí are usually handled correctly by FAR. I have the issue ONLY when I get a file that was created on a Mac with MS Office for Mac. Maybe the Mac does not use the proper unicode character for these, but uses a base character without accent (e.g. ouiae), and uses additional special unicode codepoints to add an accent (e.g. " ' double dots etc) to the previous char. (Sorry I may not use the correct words, codepoints, graphemes, etc -> I am not familiar with them)

Expected behavior

Proper alignent on the file list panel.

Actual behavior

Alignment errors, see screenshots in "Steps to reproduce" section.

tbali0524 avatar Feb 12 '25 08:02 tbali0524

Indeed, in prémium is actually a grapheme cluster, e (U+0065) + ́ (U+0301). Mac software for some reason produces these combining characters instead of precomposed ones, e.g. é (U+00E9).

You see alignment issues because in recent Windows (and/or Windows Terminal) versions Microsoft tries to properly support grapheme clusters, e.g. not only render the whole sequence as a single character, but occupy only a single cell for it, so the rest of the line inevitably shifts left. Conceptually it's the right thing to do of course, but it requires all the third party software (e.g. Far) to also fully support grapheme clusters, i.e. take character composition into account when aligning text. It's doable, but it requires a lot of work, and we're not Microsoft, so it's unlikely to happen soon.

For now we recommend disabling grapheme cluster support in Windows Terminal and/or Windows Console Host:

  • For Windows Terminal it's in Settings - Compatibility - Text measurement mode - Windows Console.
  • For Windows Console Host it's in Registry, DWORD TextMeasurement = 2, in HKCU\Console and its subkeys.

After that grapheme clusters will look uglier, e.g. pré mium, but at least aligned.

alabuzhev avatar Feb 12 '25 17:02 alabuzhev

Thanks for the insights. Shall I close this bug report issue, or re-label as a low-priority feature request?

tbali0524 avatar Feb 13 '25 06:02 tbali0524

I don't mind keeping it open as a reminder.

alabuzhev avatar Feb 13 '25 23:02 alabuzhev

...it requires all the third party software (e.g. Far) to also fully support grapheme clusters...

Well, maybe not, not grapheme clusters. I think at least part of the problem is that what to the user looks like individual visual elements (columns, column separators, content text), internally are often rendered as a single sequence of character with separators (and other decorations if any) inserted into this sequence at the positions calculated based on the lengths of the content text strings. Now, the length of the content text strings is the issue.

We can imagine another approach to rendering, where graphical elements are drawn "faithfully." In the example panel picture above, we have panels with the border, two columns, two separator lines (maybe something else) and so many text strings which appear in the columns starting at some position and extending to the right (and why not to the left, BTW, though I do not care about RTL).

If we first render the text strings starting where they should start and letting them extend as they happen, then render all decoration on top of already laid out text overwriting and hiding everything which should not be visible, we will never need to know text string lengths.

Here I am obligated to refer to another recent @alabuzhev's comment.

MKadaner avatar Feb 15 '25 21:02 MKadaner

@MKadaner such an approach does work of course, except when it doesn't. Imagine that you need to draw the user menu with the same file names of unknown lengths on top of the panel (e.g. F2). Or centered panel titles, or any centered dialogs (e.g. F8), or wrapped text in help or viewer and so on and so forth.

alabuzhev avatar Feb 15 '25 21:02 alabuzhev

@alabuzhev, yeah, right. Well, centering can be mostly avoided (I think it was not a coincidence that windows UI shifted mainly to left-alighed strings). String boxing can be done by rendering strings where they should start, then detecting where they actually ended, and draw the right border there. Line wrapping is indeed tricky...

With the pixel-based rendering, people use platform functions to calculated string width in pixels or, if those are not available, render to a side buffer and bit-blit the result. We did it in my previous life. Something similar can probably be done with character-position-based rendering. Whether all these complications worth the effort is a different question.

MKadaner avatar Feb 15 '25 23:02 MKadaner

String boxing can be done by rendering strings where they should start, then detecting where they actually ended, and draw the right border there

Image

alabuzhev avatar Feb 16 '25 00:02 alabuzhev

One step is missing, viz. "detecting where they actually ended." So, on the second attempt, the teal rectangle should be of the right width.

Is there a good way to measure actual width of a string in character positions on Windows?

MKadaner avatar Feb 16 '25 00:02 MKadaner

Of course, one might say that grapheme clusters will always be visually shorter than their corresponding characters, and, by extension, the visual string length will always be smaller than string.size(), so cutting the strings as if they're in ASCII English should work, but it's not necessarily the case: a lot of single characters can occupy double (and probably more) cells, so finding out the actual visible length is inevitable.

Not to mention that this "faithful" rendering will be slow and blinky, probably slideshow-like.

So, on the second attempt, the teal rectangle should be of the right width

I was too lazy to make a gif :)

Is there a good way to measure actual width of a string in character positions on Windows?

Unfortunately, no.

alabuzhev avatar Feb 16 '25 00:02 alabuzhev

Thank you for the link. It's horrible. Around 2005, proprietary niche Motorola OS (for phones) did provide applications with the documented API to measure pixel width of a given UNICODE string. And it actually worked, RTL, end-glyphs and everything. Now, 20 years since, we cannot have it on desktop in the most widely used OS.

MKadaner avatar Feb 16 '25 01:02 MKadaner

we cannot have it on desktop in the most widely used OS

For correctness's sake, in the (most used) graphical context, we can (and have long been able to). It's only the console/terminal subsystem that is deprived.

HamRusTal avatar Feb 16 '25 12:02 HamRusTal

Thank you for the correction, @HamRusTal. Sure, in the context of Far development I omitted this clarification.

MKadaner avatar Feb 16 '25 17:02 MKadaner