Refactor ansi stripping into `nu-util` functions
Description
Allows use of slightly optimized variants that check if they have to use the heavier vte parser. Tries to avoid unnnecessary allocations. Initial performance characteristics proven out in #4378.
Also reduces boiler plate with right-ward drift.
Tests + Formatting
Make sure you've done the following, if applicable:
- Add tests that cover your changes (either in the command examples, the crate/tests folder, or in the /tests folder)
- Try to think about corner cases and various ways how your changes could break. Cover those in the tests
Make sure you've run and fixed any issues with these commands:
-
cargo fmt --all -- --checkto check standard code formatting (cargo fmt --allapplies these changes) -
cargo clippy --workspace --features=extra -- -D warnings -D clippy::unwrap_used -A clippy::needless_collectto check that you're using the standard code style -
cargo test --workspace --features=extrato check that all tests pass
User-Facing Changes
If you're making changes that will affect the user experience of Nushell (ex: adding/removing a command, changing an input/output type, adding a new flag):
- Get another regular contributor to review the PR before merging
- Make sure that there is an entry in the documentation (https://github.com/nushell/nushell.github.io) for the feature, and update it if necessary
Not quite happy with my naming yet... Taps the drawing of the bikeshed.
src/formats/to/html.rs also does ansi stripping if given the --no-colors flag. It currently uses a simple regex; perhaps it should use your code.
For what it's worth,to html --html-color uses a more complicated table of regular expressions to convert ansi coloring to html-with-css-styling. It's incomplete, fragile, and doesn't really work.
src/formats/to/html.rsalso does ansi stripping if given the--no-colorsflag. It currently uses a simple regex; perhaps it should use your code.
Well spotted, thanks! That regex looks indeed a bit horrendous. There might be a slight difference: The strip-ansi-escapes crate we were using in most places also strips non-escape ASCII control sequences. Most of that makes sense on the terminal (like backspace etc.) the only odd one out that it also removes is TAB. In most places we currently use strip it it is primarily used for display calculations (where the tabstopwidth is undefined anyways) and operations like the file ops that shouldn't regularly encounter tabs.
For what it's worth,
to html --html-coloruses a more complicated table of regular expressions to convert ansi coloring to html-with-css-styling. It's incomplete, fragile, and doesn't really work.
With the limitations of both of those things in mind we might want to work some improved ANSI parsing instead of the current simple crate. @zhiburt 's ansitok also uses the base parsing logic but provides slightly different functions. That might be a place to start for better color/formatting reverse engineering.
ansitok now uses utilities built on the VTE state machine iterator from delta, which is well-tested for a couple of years in delta, and does no allocations in the state machine iterator (the utilities built on top may allocate). Is it maybe worth focusing/consolidating efforts on utilities built on top of the VTE iterator? One advantage is they handle OSC8 hyperlinks which the regex alternatives tend not to, and I believe the VTE state machine approach has proven to have good performance (better than others that handle OSC8 hyperlinks I think).
ansitok now uses utilities built on the VTE state machine iterator from delta, which is well-tested for a couple of years in delta, and does no allocations in the state machine iterator (the utilities built on top may allocate). Is it maybe worth focusing/consolidating efforts on utilities built on top of the VTE iterator? One advantage is they handle OSC8 hyperlinks which the regex alternatives tend not to, and I believe the VTE state machine approach has proven to have good performance (better than others that handle OSC8 hyperlinks I think).
Yeah I saw the discussions you were having.
strip-ansi-escapes under the hood uses the vte crate from alacritty as well but will always try to write bytes. That's why hacked together the conditional use with the bytes that cause a state transition in the vte parser in #4378.
For doing stuff like wrapping/size calculations that was previously a noticeable overhead.
But I see you also build a function for Unicode width calculations on top of it.
Consolidating the like 5 different ANSI crates in nushell should be a good goal.
Consolidating the like 5 different ANSI crates in nushell should be a good goal.
Haha :) Cool, I think we're saying the same thing!