team
team copied to clipboard
Embrace Standardized Color Control
I was trying to get people to embrace this crate before as a unified way to control colors but it did not get far: https://crates.io/crates/clicolors-control (that crate also turns on ansi colors for the windows console as an added bonus)
The idea was that you have one common crate that turns on and off colors for the entire cli app.
I'm fine finding a different mechanism for this but right now controlling colored output is really not a particularly fun experience.
Anything got some thoughts on this matter?
Relevant internals thread: https://internals.rust-lang.org/t/terminal-platform-abstraction/6746
I've looked at clicolors-control before, but didn't use it for a variety of reasons on which reasonable people may disagree:
- It doesn't look at the
TERMenvironment variable. I personally don't really care about the clicolors standard, so I think it's fine to use it, but it shouldn't be used to the exclusion of other things. When people setTERM=dumb, for example, then people expect that to be respected. (I've even found that some environments that don't support colors will do this automatically, and if you don't respect it, you end up emitting a bunch of escape codes.) - Its
ttydetection on Windows is insufficient for my needs. I need it to work in MSYS2 terminals. Theattycrate does this using a hack. - My
termcolorcrate (which predatesclicolors-control, AFAIK) already did everything thatclicolors-controldoes, and assimilates it with common end users inputs via theColorChoicetype. (With one exception: only recently did it acquire the ability to enable ANSI escapes on Windows 10.)
I would in principle support coalescing around one crate that just did color detection, but on that same token, I'd also like to see people coalesce around one crate for dealing with a Windows console for coloring, e.g., wincolor.
@BurntSushi for what it's worth I do not really want to defend either the crate nor the defaults. I mostly just want a crate that holds the state of color on/off with a sensible default.
My termcolor crate (which predates clicolors-control, AFAIK) already did everything that clicolors-control does
The reason I never ended up using termcolor was that it does not actually (to the best of my knowledge) store the state of color on/off. It just has the auto color choice which derives the default for the writers. You cannot query it and ask: does the user want colors.
(My goal effectively is that I can parse --color=never and then tell the entire ecosystem of crates I use to please not emit colors)
The way I would like to see terminal colours work is a wrapper around std(out/err) that interperets ansi escape sequences. If we are on windows before windows 10, we have to use the win32 api to process escape sequences, but on most platforms we can just pass them through. There is also terminfo/cap that should be respected.
If we do this, everyone just uses ansi escape sequences, and the wrapper can choose to discard them based on anything (TERM, configuration, etc.).
Idea from this post.
(My goal effectively is that I can parse --color=never and then tell the entire ecosystem of crates I use to please not emit colors)
Using this option, disabling colors would happen in the consuming crate, all crates further down the chain just write their ansi escape sequences as usual.
A bit of code:
struct AnsiPolyfillWrapper<W: io::Write>(W);
impl<W: io::Write> io::Write for AnsiPolyfillWrapper<W> {
fn write(&mut self, buf: &[u8]) -> Result<usize> {
// read buf and if necessary modify the escape codes, with many write calls, otherwise
self.0.write(buf)
}
}
There is also terminfo/cap that should be respected.
I explicitly chose to not respect this, at least for things like colors or other simplistic styling like underlining or bold. The reason why is because when I did use a library that respected terminfo, I got numerous bug reports where the wrong escape codes were being emitted. Instead, I chose to follow in the footsteps of GNU grep: just stick to the basics. GNU grep is widely deployed, and I never hear anyone complaining about whether its colors work or not, which is compelling evidence from my perspective that it has the right approach here.
For CLI tools that need to do more extensive manipulation of the terminal, then this may be the wrong calculus.
In general, I agree that an ANSI interpreter would be great, and it would, for example, slide nicely into termcolor's existing API (while simultaneously removing the need for the buffer and buffer writer types). It's a nice-to-have though IMO, and we can make progress without it. :-) In particular, the kind of support you're talking about would require a dedicated maintainer.
@BurntSushi what do you think are the most pressing issues in this area? Sorry if this question is answered elsewhere.
@derekdreery you still need to know if colors are on or off for other reasons. For instance I shell out to other processes which will typically not detect color correctly so I want to propagate my understanding of if color is on or off to that subprocess.
WRT to the wrapping stream: I agree that would be nice. Sadly right now there is only an undocumented API used by rusttest to intercept prints.
(Also now that windows 10 supports ansi colors after enabling i'm hoping we can just pretend ansi color is a thing and get rid of more complex wrappers around it)
@derekdreery The hardest possible problem: getting people to agree on one library. It may be the case that you need an ANSI interpreter to do that, because termcolor's API today is not nearly as simple as crates that only handle ANSI formatting, and that annoys people enough to go out and write their own library.
@mitsuhiko's idea about at least getting all of them to agree on whether colors are enabled or not (and permit the caller to control) is a really nice middle ground and possibly much easier to achieve. I would certainly be on board with doing that for termcolor (subject to my concerns listed above).
The idea of interpreting ANSI codes alarms me for a number of reasons:
- The number of potential ANSI codes somebody might conceivably want to send is very large, and drawing any hard line will rule out some number of reasonble use-cases
- What happens with things that look like ANSI codes but this crate doesn't know how to handle?
- If they're passed through, you're likely to get output that works perfectly on some terminals and blurts ANSI goo in others
- If they're silently ignored, that makes formatting problems difficult to debug
- The library could raise an error, but is it worth killing the entire operation over an output formatting failure? Even if the culprit is just improperly-escaped user-input?
- "half assed ANSI code handling" has caused problems before (although in that case the Rust program was the producer of such codes, not the consumer)
- Even with helper functions with friendly names like
make_green_text(), if your library's officially-supported API includes "put raw ANSI codes in your string literals", then people will do that, and raw ANSI codes are not an API surface that existing API tools likerustdocare equipped to handle gracefully - The reason that raw ANSI codes are a popular API is because it's so easy to bang strings together, and so much more difficult to work with a more structured API. This is also the reason for the popularity of cross-site-scripting attacks and SQL injection vulnerabilities. ANSI codes probably aren't as security sensitive as those other things, but still worth considering.
All that said, it may still be the case that the ease and flexibility of a custom ANSI parser outweighs all the downsides, in which case fair enough.
However, taking a cue from the "cross site scripting" analogy, I'd be interested to see/use a terminal colour API based on the ideas of HTML templating libraries like Elm's HTML DSL, ScalaTags or rust-tags. That is, building up a type-safe data-structure that represents text and its formatting, which can be stored and later serialized to ANSI-codes or Windows console API calls or HTML with inline CSS or whatever. That would hopefully have all the thread-safety advantages of ripgrep, with a more ergonomic and teachable API.
@Screwtapello you raise a lot of good points I hadn't considered! Briefly, I will just respond to one thing here, and that's the idea of building up something more structured. In particular, for tools like ripgrep, printing output is a performance sensitive aspect of execution. It is likely that any overhead associated with emitting colors would need to be avoided, and I imagine execution templates probably wouldn't be acceptable. With that said, a convenient API doesn't need to solve every use case, so long as there are lower level APIs for when that is necessary.
The wider discussions a bout ansi, terminal abstractions etc. are quite complex which is why i thought about reducing this problem to initial color detection and control.
I think it might make sense to have a crate like clicolors control that just has a flippable global and then maybe various compile features to pick a sensible default (on, off, CLICOLOR, term detection etc.)
Then it should be easy for everybody to use it as a base and the rest of the discussion can be held separately.
@Screwtapello Thanks for explaining all that! I had not considered ideas like cross-site scripting.
What I'm trying to define is the most general, minimal level, and fast API that can do everything that's required.
The number of potential ANSI codes somebody might conceivably want to send is very large, and drawing any hard line will rule out some number of reasonble use-cases.
We would draw a line somewhere, probably text styling and cursor movement to start with. This line could be moved in future, and if people used the library it would show where the gaps are and what features are most missed. These could be prioritised.
What happens with things that look like ANSI codes but this crate doesn't know how to handle?
I think there are a few options here. I believe that in production (release mode) the correct thing to do is to strip out these sequences, but in debug mode it may be useful to log their presence.
If they're silently ignored, that makes formatting problems difficult to debug
If someone comes to you with a formatting issue, you can hand them a version of the program compiled to write out a warning when it comes across an unexpected sequence.
"half assed ANSI code handling" has caused problems before (although in that case the Rust program was the producer of such codes, not the consumer)
I'd argue this would just be a bug that should be fixed. Any implementation may have bugs.
Even with helper functions with friendly names like make_green_text(), if your library's officially-supported API includes "put raw ANSI codes in your string literals", then people will do that, and raw ANSI codes are not an API surface that existing API tools like rustdoc are equipped to handle gracefully
I agree. This library should not be for user consumption. An alternative library should be advertised that implements a programmatic API, using the type system to prevent invalid input. But we still need the implementation of that API, and I propose that this is the best way to do it, or at least that it is worth investigation.
This is also the reason for the popularity of cross-site-scripting attacks and SQL injection vulnerabilities.
I need to think more about this to fully understand the risks. I think it would be the role of the higher-level API to sanitize any input, so that escape sequences are only emitted from the API calls. This would need careful auditing.
So, in summary you would have 2 levels
- (lower) escape sequences polyfill
- fast (this is a guess because I imagine you could write a cache-friendly wrapper around the write calls, would need to check if its true or not)
- flexible (can be sent over stream before being converted into platform-specific, pre-serialized)
- complicated, and error prone if you write your own escape sequences
- (higher) programmatic API
- safe (type system and sanitization)
- simpler, more ergonomic
- slower (maybe, or will api calls just get inlined into string operations)
- risky (sanitation needs to work, is there a difinitive list of escape sequences, or patterns of escape sequences that need to be stripped).
If you went straight for a higher level API you still have the issue of what to do with escape sequences in the input. Should you remove them, or leave them in? If you remove them, how do you know you are getting them all?
@mitsuhiko what are the specific things that are missing now that you would like to see as a minimum?
@derekdreery there needs to be an agreed upon way to turn color on and off. clicolors-control exists but nobody uses it and @BurntSushi outlined some of his reasons for not doing it. So what I can propose right now is to take that crate, incorporate his suggestions but I won't be able to force the rest of the community to use it :)
Could you simplify it to:
- If
CLICOLOR == '0'force disable color - If
CLICOLOR != '0'force enable color - If
CLICOLORis unset, make a guess (using TERM etc.)
FYI: There is an attempt to make standard with NO_COLOR environment variable.
http://no-color.org/
I've played with this a bit, reading the xterm guide. I've also been reading the standard. My plan is to implement the ansi standard, at least to recognise all escape sequences specified there. Then I can process a subset of them and skip the rest.
For the issue with hostile escape sequence injection, any utf-8 sanitized text will not contain escape sequences. Therefore it is sufficient to check that input is utf-8 (see the xterm guide near the beginning for more details).
@derekdreery If I'm understanding you right, I'm not sure that will work. In particular, assuming or requiring that output (or input) is UTF-8 is unfortunately inappropriate in almost any UNIX command line tool.
@BurntSushi my argument is that you can avoid injection by using utf8, but that if you are not then there may be escape sequences present. My approach (at least to start with) will be to just strip any escape sequence as defined in the spec (including things like STX), and add back in the ones that I'm willing to handle for the given platform.
I think what I'm doing is an experiment at this point. I think having a working (by my probably incorrect definition) library will help to spark more debate.
For the issue with hostile escape sequence injection, any utf-8 sanitized text will not contain escape sequences.
I think you're talking about C1 control codes, which overlap with UTF-8 extension bytes. While it's true that a C1 control code like 0x9B (CSI, Control Sequence Introducer) is not directly valid in UTF-8, terminals date back to a time when only 7-bit ASCII was reliably available, so every C1 control-code has a 7-bit encoding of ESC followed by an ASCII character. Thus, any ANSI code defined to begin with CSI can also begin with ESC [ (0x1b 0x5b), and that is valid UTF-8.
@Screwtapello you are correct, I'm still learning :).
@iquiw there was an earlier attempt to standardize on CLICOLOR which managed to gain some traction and that's what clicolors-control currently uses: https://bixense.com/clicolors/
One thing I would like to point out when developing a terminal color API is the annoyance of testing it. It should be easy to:
- Represent the formatting as data, i.e. as a list of structs which may have embedded structs
- It should be possible to serialize/deserialize these structs using serde so that tests can use
yamlor some other format for writing expected output. - Ideally there would even be some kind of "html" like language for representing the formatting.
- It should be possible to mutate the formatting, especially "force plain". It is really annoying to have if/else blocks depending whether the user specified
--plain. I want to just be able to calltext_items.map(|t| t.set_plain())and be done with it.
I wrote termstyle to address some of these issues, but I thought it would be good to bring them up here. (not saying termstyle is the be-all-end-all -- I've only just started using it myself).
I totally agree that the actual coloring should be done through a single low-level (standardized) crate. I didn't actually know the trick with windows and I'm now inclined to use ANSI escape codes for all platforms.
For using colours in CLI output, also consider that for example less has a pass-through mode for only ESC [ ... m sequences using the -R option (by default). This works solidly, e.g. with line-wrapping. So if you want to be compatible with less (as git and grep and so on are), then you limit yourself to ESC [ ... m. So this is relatively simple to handle, e.g. embedded in strings, or for stripping out or skipping when counting visible characters.
Terminfo may include other control codes in its colour sequences, which are fine for a TUI but not what you want in this case.
Considering that a CLI tool (but not a TUI) generally needs to handle having its output redirected to a file and then read later in a pager, this really cuts down on what ANSI sequences need to be handled.
Other ANSI sequences might be used inside a readline-like library, but this is not visible outside of that library, and wouldn't get into a log file from non-interactive use of the CLI tool (e.g. where there is no use of stdin, or stdin is redirected from a file).
If anyone's interested on progress I'm currently gaining the ability to recognise all ansi escape sequences according to the spec. At that point I'll be able to strip escape sequences. After that, I will start to interperest some of the more simple sequences like colour.
On 9 Mar 2018 18:20, "Jim Peters" [email protected] wrote:
Considering that a CLI tool (but not a TUI) generally needs to handle having its output redirected to a file and then read later in a pager, this really cuts down on what ANSI sequences need to be handled.
Other ANSI sequences might be used inside a readline-like library, but this is not visible outside of that library, and wouldn't get into a log file from non-interactive use of the CLI tool (e.g. where there is no use of stdin, or stdin is redirected from a file).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/cli-wg/issues/15#issuecomment-371900673, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU-XqoPo2VRS00EHFazr7Dtj91uxyFtks5tcsfogaJpZM4SZwS0 .
For the sorts of tools I wrote, declaring that you can only have color support on Windows 10 is just fine. Most command-line tools tend to be developer-focused, and most developers aren't running old versions of Windows anyway. I think as long as tools work on older versions of Windows, color is simply an added bonus, so I don't think we should be writing foundational crates that jump through hoops to support color in that situation.
If anyone's interested on progress I'm currently gaining the ability to recognise all ansi escape sequences according to the spec. At that point I'll be able to strip escape sequences. After that, I will start to interperest some of the more simple sequences like colour.
You might be interested in the strip-ansi-escapes crate I put together recently for sccache. sccache runs compilers and captures their stdout/stderr and then outputs it elsewhere, so the standard "is stdout a tty" checks don't work well. With this crate I can force color output on from compilers and then do the tty check when it's time to output and strip escapes at that point. (My crate uses the vte crate for the actual parsing, so I have fair confidence that it does the right thing.)
Most command-line tools tend to be developer-focused, and most developers aren't running old versions of Windows anyway.
Do you have numbers on this?
My day job is primarily in the administration and security field, which is heavily command line focused. The number of Windows 10 machines we use is absolutely dwarfed by the number of Windows 7 (and even still Windows XP) machines. We're also heavy users of Windows Server 2008/2012.
I'd imagine shops like mine, while not a majority, certainly aren't only a tiny subset of command line tool users on Windows.
@kbknapp @BurntSushi regardless of the number of Win7 (or XP) developer machines currently in use, it is important to note that that number is guaranteed to drop over time. So the question we should ask is: should the design of color add additional complexity in order to support a usecase which will continue to be more and more deprecated?
@vitiral I don't know. All I know is what I endeavor to support today. I certainly would never attempt to enter the business of predicting when older versions of Windows became such a small source of users that they aren't worth supporting fully. I mean, in theory, it could be years. But nobody has any hard numbers as far as I can tell.