htop icon indicating copy to clipboard operation
htop copied to clipboard

Shorten CWD path

Open onlined opened this issue 3 years ago • 12 comments

For showing the CWD path in a more compact way, this function

  • shortens path parts before last from right to left, until first character of the path stays (directory -> direc~ or d~)
  • combines parts before last from right with left (a~/b~ -> a~~)
  • shortens the last part from middle, preferring to keep characters from left (longdirectoryname -> long~ame)

in order (of 2nd and 3rd, the one that removes more characters is applied first) until given maximum length is reached.

Closes #462.

onlined avatar Oct 21 '21 21:10 onlined

@BenBE I see that you indicated that the code must be multibyte aware. I totally agree with you and I started working on it. I have one question: Is it OK to use standard library multibyte functions (mblen, mbsrtowcs etc.) or do I need another solution for it?

onlined avatar Oct 22 '21 08:10 onlined

@BenBE I see that you indicated that the code must be multibyte aware. I totally agree with you and I started working on it. I have one question: Is it OK to use standard library multibyte functions (mblen, mbsrtowcs etc.) or do I need another solution for it?

Using standard library is totally fine and much preferred over including additional libraries we do not yet link with.

BenBE avatar Oct 22 '21 09:10 BenBE

I finally have a chance to test your code, and it might be late to inform you that there's one thing you need to address. htop CWD with CJK characters screenshot

People who didn't live in East Asia wouldn't know, that not all characters occupy the same width in the terminal. Specifically,

  • CJK characters may occupy twice the width of the Latin letters, and
  • Combining characters occupy no width, but modifies the character before it.

The above screenshot shows CJK characters (Chinese numerals '一二三四五六'), I've also tested with some Zalgo text and it displays weirdly, too.

Explorer09 avatar Oct 26 '21 02:10 Explorer09

@Explorer09 @BenBE I tried to find a solution for finding out how many columns a character occupies. I saw wcwidth and wcswidth but their behavior changes according to locale, so it's not much sane to use them. Do you know any portable (POSIX is enough probably) and reliable way for finding how much space a wide character occupies?

onlined avatar Oct 26 '21 22:10 onlined

@online I already expected you would get stuck with the problem when I mentioned it. It troubled me too (when I worked in a project of my company), and I have a conservative approach:

  1. Set terminal cursor position to where the character would be printed (make sure it's safe to print junk characters there).
  2. Print the character (one at a time).
  3. Query cursor position to determine the width.
  4. Repeat step 2.

Yes, the safest way is to just print it out. And you are right that the width is locale-dependent. If you have read the UAX #11 East Asian Width, there is an "East Asian Ambiguous" category of characters, whose rendering widths are determined by the terminal's default character encoding (full-width when a CJK encoding is selected, and half-width when a non-CJK encoding is selected). I don't think the locale data is more reliable than the terminal setting when it comes to rendering, and that's why I suggest the above steps instead.

Explorer09 avatar Oct 27 '21 02:10 Explorer09

@Explorer09 But I need to preprocess the string and decide which characters to keep or remove, before actually printing the characters. Do you suggest printing to a non-visible or dummy terminal?

onlined avatar Oct 30 '21 15:10 onlined

@onlined My idea is to use the space of the CWD table cell -- you print the characters just as you process it. It's okay to overprint the space multiple time until you finally shorten the path.

Explorer09 avatar Oct 30 '21 15:10 Explorer09

@onlined My idea is to use the space of the CWD table cell -- you print the characters just as you process it. It's okay to overprint the space multiple time until you finally shorten the path.

Problem is, this cell is not actually available for reference in that routine. The function should be independent of which screen position it is later on printed too. In particular, the printed string is only rendered into a RichString later on and that is finally written onto the display. So shortening to 25 code points with wcwidth/wcswidth is fully acceptable IMHO (even if this may differ somewhat from what the terminal will finally end up actually writing). Cf. mbswidth(3)

BenBE avatar Oct 30 '21 18:10 BenBE

@BenBE That's a structural problem. It would be better to fix the routine to accept a terminal position as the parameter. You don't need to prepare the string until it's printed at least once. So it's not unreasonable to request a terminal position for the routine.

Explorer09 avatar Oct 30 '21 18:10 Explorer09

@BenBE That's a structural problem. It would be better to fix the routine to accept a terminal position as the parameter. You don't need to prepare the string until it's printed at least once. So it's not unreasonable to request a terminal position for the routine.

Yes and no.

Yes, we should delay the preparation of this string to when we first actually write it to the display. But that's not how this works internally in htop right now and would take a bit of structural work to change it that way. That's not part of this PR and if we go that route, then we'd need to do quite a bit of related work elsewhere too to keep things consistent.

And no, you shouldn't print temporary things to a terminal. True story: I had a text-based app once that had a progress bar for showing the load progress when it was initializing different things. So far, so normal. One of the users had a braille display connected to the computer and sent in a complaint about the braille display going brrrrrrrrrr whenever the program started. Long story short: DON'T write temporary or fast-changing information to a terminal.

BenBE avatar Oct 30 '21 18:10 BenBE

@BenBE But htop is ncursed-based! If htop prints everything sequentially (to maintain compatibility with line terminals), I won't suggest this approach. I know what you mean when you say progress bar as an example: wget is the example of having two ways to output the progress. One prints everything sequentially (incuding the progress indicator), the other uses overprinting to indicate the progress (and it prints the filename being downloaded in a scrolling text).

Explorer09 avatar Oct 31 '21 01:10 Explorer09

@BenBE But htop is ncursed-based! If htop prints everything sequentially (to maintain compatibility with line terminals), I won't suggest this approach. I know what you mean when you say progress bar as an example: wget is the example of having two ways to output the progress. One prints everything sequentially (incuding the progress indicator), the other uses overprinting to indicate the progress (and it prints the filename being downloaded in a scrolling text).

Both methods cause the mal-behavior I described. The problem is not the sequential printing, but the printing itself.

BenBE avatar Oct 31 '21 01:10 BenBE