htop icon indicating copy to clipboard operation
htop copied to clipboard

CommandScreen: handle multi-byte UTF-8 code sequences while wrapping

Open ryenus opened this issue 6 months ago • 3 comments

In earlier implementation of the command screen, the process command was treated as a sequence of single-byte characters. When there are wide chars, especially UTF-8 byte sequences in the command, it's possible to have part of the multi-byte sequence wrapped to next line.

To fix that we now leverage mbrtowc and wcwidth to know the byte count and character width for non-ASCII characters, so that we can now wrap near window edge and more importantly at character boundaries.

@BenBE Would you mind having a look at this when you get a chance?

ryenus avatar Jun 22 '25 14:06 ryenus

I have been trying to make a few PRs (pull requests) that could bring Unicode character width support in htop. So you are not the first one that proposed the idea.

There are many places in htop codebase that need to be upgraded for Unicode character width support. So your width calculation function would be better not limited to CommandScreen.c use.

To avoid duplicate effort, maybe you could look at my PR #1642 to see what you can do with the width calculation.

Explorer09 avatar Jun 22 '25 15:06 Explorer09

@Explorer09, thanks a lot for sharing your thoughts and pointing me to your work on related topics. I really appreciate your insights and all the effort you’ve put into this area.

For this PR, I’m mainly focused on implementing proper line wrapping at or near the window border. I agree that a broader, unified solution for wide character support, especially an improved control character handling worth looking. However, I think those are best addressed in their own dedicated PRs and discussions, such as what you’re working on in #1642. For now, I’d prefer to keep the scope here limited so we can resolve this specific issue first.

Thanks again for your feedback and for helping move things forward!

ryenus avatar Jun 25 '25 03:06 ryenus

@BenBE actually the code had been like that, in the sense that everything was in just one function, it was later split up :-) I just pushed the earlier commits for your reference.

ryenus avatar Jun 25 '25 09:06 ryenus