bat
bat copied to clipboard
Issue handling text with escape sequences already embedded
What steps will reproduce the bug?
- Create a file with multiple embedded escape sequences in one line:
# shell one-liner to create a simple demo file for the bug
printf '\033[1;32mgreen,bold \033[39mcolorless,bold \033[35mpurple,bold \033[39mcolorless,bold \033[0mcolorless\n' > bat-bug-test
-
bat -pp
the new file
What happens?
The first escape sequence overrides all following escape sequences before the final '[0m' (clear formatting) escape sequence
What did you expect to happen instead?
The escape sequences would render properly, or not at all
For context, both cat
from coreutils version 8.32-4ubuntu2 and busybox cat
from busybox 1:1.30.1-6ubuntu3.1 on Pop!_OS 21.04 result in the following:
Meanwhile, bat -pp
from bat version 0.20.0 results in the following:
For context, I have a "shell collection" consisting of 17 command-line shells, with different colors for the prompts, depending on the shell, and I have an executable that prints out the different prompts, which can be seen here. I first ran into this issue when running bat -pp
on that file.
How did you install bat
?
From the GitHub releases .deb
file
bat version and environment
Software version
bat 0.20.0 (0655ecf)
Operating system
Linux 5.16.19-76051619-generic
Command-line
bat --diagnostic -pp bat-bug-test
Environment variables
SHELL=/usr/bin/bash
PAGER=<not set>
LESS=-Q
LANG=en_US.UTF-8
LC_ALL=<not set>
BAT_PAGER=<not set>
BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=<not set>
XDG_CONFIG_HOME=/home/eliminmax/.config
XDG_CACHE_HOME=<not set>
COLORTERM=truecolor
NO_COLOR=<not set>
MANPAGER=<not set>
Config file
# This is `bat`s configuration file. Each line either contains a comment or
# a command-line option that you want to pass to `bat` by default. You can
# run `bat --help` to get a list of all possible configuration options.
# Specify desired highlighting theme (e.g. "TwoDark"). Run `bat --list-themes`
# for a list of all available themes
--theme="Visual Studio Dark+"
# Enable this to use italic text on the terminal. This is not supported on all
# terminal emulators (like tmux, by default):
--italic-text=always
# Uncomment the following line to disable automatic paging:
#--paging=never
# Uncomment the following line if you are using less version >= 551 and want to
# enable mouse scrolling support in `bat` when running inside tmux. This might
# disable text selection, unless you press shift.
# --pager="less --RAW-CONTROL-CHARS --quit-if-one-screen --mouse"
# Syntax mappings: map a certain filename pattern to a language.
# Example 1: use the C++ syntax for .ino files
# Example 2: Use ".gitignore"-style highlighting for ".ignore" files
#--map-syntax "*.ino:C++"
#--map-syntax ".ignore:Git Ignore"
# treat all files in a directory called ".bashrc.d" as bash scripts
--map-syntax '**/.bashrc.d/*:Bourne Again Shell (bash)'
# default options
--style plain,header,numbers,changes
--tabs 4
--wrap never
Compile time information
- Profile: release
- Target triple: x86_64-unknown-linux-gnu
- Family: unix
- OS: linux
- Architecture: x86_64
- Pointer width: 64
- Endian: little
- CPU features: fxsr,sse,sse2
- Host: x86_64-unknown-linux-gnu
Less version
> less --version
less 551 (GNU regular expressions)
Copyright (C) 1984-2019 Mark Nudelman
less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: http://www.greenwoodsoftware.com/less
Additional context: when replacing the ESC character with ^[
, the output of bat -pp bat-bug-test
is as follows:
^[[1;32m^[[38;2;220;220;220m^[[32m^[[1;32mgreen,bold ^[[0m^[[39m^[[38;2;220;220;220m^[[1;32mcolorless,bold ^[[0m^[[35m^[[38;2;220;220;220m^[[35m^[[1;32mpurple,bold ^[[0m^[[39m^[[38;2;220;220;220m^[[1;32mcolorless,bold ^[[0m^[[0m^[[38;2;220;220;220mcolorless^[[0m
The coreutils cat
results in the following:
^[[1;32mgreen,bold ^[[39mcolorless,bold ^[[35mpurple,bold ^[[39mcolorless,bold ^[[0mcolorless
I understand how odd of an edge case this is, and I'm sure it's difficult to handle existing escape sequences while adding new sequences for the syntax highlighting, so it's no big deal if this is a low-priority or WONTFIX situation.
Thanks for reporting, it's useful to know even about edge cases :) I'm curious how bat
prior to https://github.com/sharkdp/bat/pull/1596 behaves. Maybe you can try v0.18.3 and let us know the outcome? - it could give us an idea where to look.
I just installed v0.18.3 with cargo
and tried it out, and got the following:
^[[38;2;220;220;220m^[[1;32mgreen,bold ^[[39mcolorless,bold ^[[35mpurple,bold ^[[39mcolorless,bold ^[[0mcolorless^[[0m
The visual appearance is pixel-for-pixel identical to busybox cat
and coreutils cat
, but it does have the extra escape sequences at the beginning and end.
This is a known limitation and is mentioned in the readme: https://github.com/sharkdp/bat#garbled-output
You are advised to turn off syntax highlighting for such input files
Don't know how I missed that. My bad.
No worries
I just realized that the issue persists even with --wrap never --color never
.
The result is as follows:
^[[1;32m^[[32m^[[1;32mgreen,bold ^[[39m^[[1;32mcolorless,bold ^[[35m^[[35m^[[1;32mpurple,bold ^[[39m^[[1;32mcolorless,bold ^[[0mcolorless
It seems like bat
is inserting the first escape sequence it encounters after all following escape sequences other that the final (clear formatting) one.
I can confirm this is a regression introduced by 63ad53817d (first released in bat v0.19.0). The bug reproduces with that commit. With the commit before, bat
has the same output as cat
. If you make bat use loop-through mode by piping the bat
output through cat
, then it also works as it should: bat bat-bug-test | cat
.
Thanks a lot for reporting this. My apologies for closing this issue.
Our code for ANSI escape passthrough is very complicated. It was never fully clear to me why we should have special code to handle input with ANSI escape characters, since syntax highlighting does not work with such files anyway. I should do more research before proposing this, but maybe we should simply remove our ANSI escape passthrough code? The benefits would be:
- We behave more similarly to
cat
(at least for this case, see details in linked PR below) - Performance would improve
- The code would be significantly simplified
Prototyped code for the above proposal that pass all regression tests (after changing one of them) and that also fixes the bug this issue is about can be found here: #2189
To be clear, I was the one who closed the issue myself - no need to apologize.
So with v0.22.1, this is an issue again, which is a shame because this fix/adjustment was useful to me. Any chance of bringing this back as a parameter and doing a v0.22.2 release?
Sorry about that. But my cheat (removing ANSI code completely) didn't fly. So we have to pretend I never did it.
In other words, someone needs to figure out what the problem is with the existing code, and then write as many regression tests as possible, because the ANSI processing code is going cause us quite a bit of trouble I am afraid.
Sorry about that. But my cheat (removing ANSI code completely) didn't fly. So we have to pretend I never did it.
In other words, someone needs to figure out what the problem is with the existing code, and then write as many regression tests as possible, because the ANSI processing code is going cause us quite a bit of trouble I am afraid.
So a --ansi-preprocessing=never
parameter is out of the question?
Aha you mean like that. That sounds reasonable to me, but the code would have to be refactored quite a bit to make that possible, I think. If you want to take a shot at it, I recommend to do it in several small incremental PRs, so that code review and performance regression testing becomes easy to do.
I'm going to try and tackle this issue, since ANSI escape sequences (and the parsing of) is my domain of expertise :)
Our code for ANSI escape passthrough is very complicated. It was never fully clear to me why we should have special code to handle input with ANSI escape characters, since syntax highlighting does not work with such files anyway.
@Enselic, it's meant to support the drop-in-replacement aspect of bat
and serves three purposes:
- It allows
bat
to ignore ANSI sequences when calculating the width of a line. - It allows ANSI coloring to persist across wrapped lines, which would otherwise be reset at the end of the line.
- The plain text syntax highlighting emits colors at the start of the line, and certain sequences (e.g.
\x1B[39m
,\x1B[0m
) would wipe out the default coloring used bybat
, even though the intent of those sequences is to reset the coloring to default.
Removing ANSI processing entirely simplifies the code a bit, but I don't think the performance increase is worth breaking support for that. I did a toy test by stripping out ANSI handling in the printer and ran it against a 100,000 line file, and the result was a difference of 19.3 ms between the two—or about 193 nanoseconds per line.
When running it against a non-benchmark example, the difference was negligible enough that variances in clock frequency boosting made bat
's master branch benchmark look better.
Meanwhile, these are the consequences of stripping support for ANSI parsing:
Fixed in #2856:
It actually should have been fixed in #2544, but I didn't account for SGR sequences that combine text attributes with colors, so it ended up grouping the green color into the bold attribute. I've now fixed it and added a regression test.
Yep, it's fixed alright.
Really hope this necessitates a new stable version sometime soon.