htop icon indicating copy to clipboard operation
htop copied to clipboard

Accept wide chars in filter input (#1038)

Open adsr opened this issue 3 years ago • 5 comments

(For hacktoberfest)

~Notably this doesn't handle wcwidth!=1 or extended grapheme clusters, but you can filter for your emoji process now.~

Handles wcwidth!=1 now if the input is validly encoded. See FunctionBar_displayWidth.

adsr avatar Oct 13 '22 06:10 adsr

Added your fixes and squashed. Thanks @BenBE.

adsr avatar Oct 14 '22 00:10 adsr

I have a concern about Panel_getCh function. The API was designed to return a byte despite us implementing multi-byte support for it. And the function maintains a state (mbbuf, mbbuf_i and mbbuf_len) to make it work with multi-byte. I wonder if it is possible to avoid the state by changing the function to return the multi-byte sequence directly. I'm thinking of a function prototype like this:

int Panel_readMultiByteChar(Panel* this, char *mbBuf)
// mbBuf must have at least (MB_LEN_MAX + 1) bytes. The function outputs the
// multi-byte sequence in mbBuf and the sequence is terminated by a 0x00 byte.

This API would be more future-proof as it no longer returns a byte at the middle of the sequence, which could lead to a (potential) synchronization problem (I know this synchronization problem won't affect UTF-8 though).

Explorer09 avatar Oct 17 '22 04:10 Explorer09

@BenBE Could you add the hacktoberfest-accepted label? It takes a few days for it to register and I'd like to get credit for it if you think it's worthy of course. Sorry to bother.

adsr avatar Oct 18 '22 03:10 adsr

@Explorer09 Are there any other open issues with this PR that you currently see?

AFAICS we have one issue re escaping of non-printables/invalid encodings for passing as a filter (the escape handling is out of scope, the proper filtering should be dealt with).

Also there's the last_wkey global I don't quite like. How much work would it take to avoid it?

@adsr We discussed this in the team and have not reached agreement on this just yet.

BenBE avatar Oct 20 '22 11:10 BenBE

Also there's the last_wkey global I don't quite like. How much work would it take to avoid it?

I think the main alternative is to modify Panel_getCh and its callers to use wint_t instead of int or char as @Explorer09 suggested. We might be able limit the scope of the change by introducing some proxy functions that call the existing functions in a loop, e.g., InfoScreen_run invokes Panel_getWch and then InfoScreen_run_inner in a loop with the individual bytes of the wide char. IMO this would be uglier than the current implementation, so I'd vote either for changing it all or sticking with the current implementation.

adsr avatar Oct 21 '22 05:10 adsr