htop
htop copied to clipboard
Substitute non-printable characters (e.g. TAB, CR, LF) in the CMDLINE
bug 1/2 from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057714
From the Debian bug report, I cannot see why newlines in CMDLINE should be substituted with spaces. The newline character being visible should be considered a feature. If we need an alternative graphic character, how about U+240A(␊)?
We still need an ASCII fallback cf. htop --no-unicode
We still need an ASCII fallback cf.
htop --no-unicode
The original bug report didn't mention about ASCII mode, so I assume Unicode mode already, and in that U+FFFD would not be a bug.
I think the proper way to fix this is to introduce a "escaped/quoted" string for CMDLINE display. In which all non-printable characters (can be determined by isprint() C function) would be presented in an escaped form like \n or \u2000, plus quotation marks escaped as \".
It's not trivial to implement this. But simply replacing newlines with spaces just doesn't look right.
To shed some light on the issue, as the reporter mixed that up in their description (took me a while to notice):
The problem is that we cc the command line 1:1 onto the screen including control characters. The U+FFFD is caused by the U+0009 being un-printable (not the U+000A as initially assumed), thus getting replaced.
For a bug fix we should decide how we want to handle non-printable characters. We are free in what we do, but if we decide to use Unicode and print 💩, we have to have a fallback for non-Unicode too.
Ideally, the replacement preserves information AND is unambiguous to the user.