llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Fix color codes emitting mid-UTF8 code.

Open blackhole89 opened this issue 1 year ago • 1 comments

Some moving around of ANSI color code emissions in recent patches has left us in a situation where RESET codes were getting defensively emitted after every token, resulting in multibyte UTF-8 codes that are split across tokens being broken up:

image

This fixes that, while cleaning up the entire color control architecture, stopping unnecessary emission of color codes and probably making it more conducive to portability later on, e.g. to platforms where color has to be set by out-of-band API calls.

image

blackhole89 avatar Mar 20 '23 01:03 blackhole89

Thanks for the review! I do wish we could find a more elegant solution, but it's not easy.

In that sense maybe using a "global state" for the console isn't a very good long-term solution, and it would be better to have to the "console state" as a param which could be carried over with the params context wherever needed.

The thing is, the actual console state, which the con_st variable is supposed to reflect, is in fact global. If the existing code sets the color to blue, control passes to an unrelated routine that was written without knowledge of our code, and that routine then prints to the console, the text will come out blue all the same. This is arguably a design flaw of the console itself, as a shared resource with internal state that is not easily interrogated, but doing anything about that is obviously way outside of our scope. However, given that, I'm not convinced making con_st more local on our end would amount to an improvement as far as the overall design is concerned. At most you could argue that it should be associated with a particular fd, but even that has problems as multiple fds can (and often will: setting the color on stdout will affect output to stderr) point to the same console.

Note also that we do in fact already have a piece of state that is "global" / associated with main.cpp-the-linked-module, namely static bool is_interacting. To be fair, that one was my doing as well, but it was somewhat necessitated by the requirement to have the SIGINT handler interact with the main loop (which you could argue is also downstream from the console being a stateful shared resource). Therefore, even if con_use_color and con_st are awkward, I'd argue that they don't really push the envelope.

blackhole89 avatar Mar 20 '23 12:03 blackhole89