CommandScreen: handle multi-byte UTF-8 code sequences while wrapping
In earlier implementation of the command screen, the process command was treated as a sequence of single-byte characters. When there are wide chars, especially UTF-8 byte sequences in the command, it's possible to have part of the multi-byte sequence wrapped to next line.
To fix that we now leverage mbrtowc and wcwidth to know the byte count and character width for non-ASCII characters, so that we can now wrap near window edge and more importantly at character boundaries.
@BenBE Would you mind having a look at this when you get a chance?
I have been trying to make a few PRs (pull requests) that could bring Unicode character width support in htop. So you are not the first one that proposed the idea.
There are many places in htop codebase that need to be upgraded for Unicode character width support. So your width calculation function would be better not limited to CommandScreen.c use.
To avoid duplicate effort, maybe you could look at my PR #1642 to see what you can do with the width calculation.
@Explorer09, thanks a lot for sharing your thoughts and pointing me to your work on related topics. I really appreciate your insights and all the effort you’ve put into this area.
For this PR, I’m mainly focused on implementing proper line wrapping at or near the window border. I agree that a broader, unified solution for wide character support, especially an improved control character handling worth looking. However, I think those are best addressed in their own dedicated PRs and discussions, such as what you’re working on in #1642. For now, I’d prefer to keep the scope here limited so we can resolve this specific issue first.
Thanks again for your feedback and for helping move things forward!
@BenBE actually the code had been like that, in the sense that everything was in just one function, it was later split up :-) I just pushed the earlier commits for your reference.