bat
bat copied to clipboard
Some utf8 output from man appears as escaped bytes
What steps will reproduce the bug?
- execute
man bat(or other manpages) withMANPAGER="sh -c 'col -bx | bat -l man -p'"on a terminal with a width small enough that man hyphenates some words.
What happens?
man output such as
It also communicates with git(1) to show modifications with re\xe2\x80\x90
spect to the git index
What did you expect to happen instead?
The output should be:
It also communicates with git(1) to show modifications with re‐
spect to the git index
How did you install bat?
Occurs with bat v0.22.1 installed by brew on Ubuntu 22.04 and v0.19 installed on the same system via apt.
bat version and environment
> bat --diagnostic
Software version
bat 0.22.1
Operating system
Linux 5.15.0-60-generic
Command-line
bat --diagnostic
Environment variables
SHELL=/usr/bin/zsh
PAGER=less
LESS=-R
LANG=en_GB.UTF-8
LC_ALL=<not set>
BAT_PAGER=<not set>
BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=<not set>
XDG_CONFIG_HOME=<not set>
XDG_CACHE_HOME=<not set>
COLORTERM=truecolor
NO_COLOR=<not set>
MANPAGER='sh -c '\''col -bx | bat -l man -p'\'''
System Config file
Could not read contents of '/etc/bat/config': No such file or directory (os error 2).
Config file
Could not read contents of '/home/jscdev/.config/bat/config': No such file or directory (os error 2).
Custom assets metadata
Could not read contents of '/home/jscdev/.cache/bat/metadata.yaml': No such file or directory (os error 2).
Custom assets
'/home/jscdev/.cache/bat' not found
Compile time information
- Profile: release
- Target triple: x86_64-unknown-linux-gnu
- Family: unix
- OS: linux
- Architecture: x86_64
- Pointer width: 64
- Endian: little
- CPU features: fxsr,sse,sse2
- Host: x86_64-unknown-linux-gnu
Less version
> less --version
less 590 (GNU regular expressions)
Copyright (C) 1984-2021 Mark Nudelman
less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: https://greenwoodsoftware.com/less
More details
A partial workaround I've discovered is to run man with --no-hyphenation|--nh but there are still some unicode code points that are making it to the output. Here's a snippet of MANPAGER="sh -c 'col -bx | bat -l man -p'" man --nh man
See the \xe2\x80\x9cWarnings\xe2\x80\x9d node in info groff
and then MANPAGER="bat -A" man --nh man
··See·the·\u{201c}Warnings\u{201d}·node·in·i␈in␈nf␈fo␈o·g␈gr␈ro␈of␈ff␈f·
I've checked that it's not caused by less with by running with BAT_PAGER set and empty.
Terminal emulator is alacritty, but I can't see what difference that would make.
Thanks for reporting. Interestingly, I haven't been able to replicate this. For a concrete example, I tried with xfce4-terminal v0.8.10 sized 52 x 24:

Similar problem for me on Kitty (Fedora 36 on Sway).
1mNAME0m
ls - list directory contents
1mSYNOPSIS0m
1mls 22m[4mOPTION24m]... [4mFILE24m]...
1mDESCRIPTION0m
List information about the FILEs (the current directory by default). Sort entries alphabetically if none of 1m-cftuvSUX 22mnor 1m--sort 22mis
specified.
@SaElAh Yours looks like a different problem with ANSI escape codes, not with unicode characters. Please search the issue tracker if this has been reported and open a new ticket otherwise.
I cannot reproduce this either.
What is your locale? Maybe it's related to that?
Seems like I don't even get Unicode characters in the first place:
▶ LANG=C MANPAGER="sh -c 'col -bx | grep Warnings | hexdump -C'" man man
00000000 20 20 20 20 20 20 20 20 20 20 20 20 20 20 64 65 | de|
00000010 66 61 75 6c 74 20 69 73 20 22 6d 61 63 22 2e 20 |fault is "mac". |
00000020 20 53 65 65 20 74 68 65 20 22 57 61 72 6e 69 6e | See the "Warnin|
00000030 67 73 22 20 6e 6f 64 65 20 69 6e 20 69 6e 66 6f |gs" node in info|
00000040 20 67 72 6f 66 66 20 20 66 6f 72 20 20 61 20 20 | groff for a |
00000050 6c 69 73 74 20 20 6f 66 20 20 61 76 61 69 6c 61 |list of availa|
00000060 62 6c 65 20 20 77 61 72 6e 69 6e 67 0a |ble warning.|
0000006d
I have groff 1.22.4
Currently also an issue on Alacritty on Arch Linux (running man ls)
@ChocolateOverflow Have you tried MANROFFOPT="-c" as suggested in the readme?
I had the same problem and this helped.
@christoph-heinrich Yeah MANROFFOPT="-c" seems to fix my issue.
Sorry for not replying to this for ages!
I have LANG=en_GB.UTF-8.
Running LANG="C" MANPAGER=sh -c 'col -bx | bat -l man -p' man man displays the expected output so it's clearly a locale related issue. I'm not sure if it's expected to need to use LANG="C" but aliasing man='LANG=C man' is a usable workaround.
FWIW, I stumbled across something similar today — and I can see how, in *my* case, the Problem Exists Between the Keyboard And Chair...
(I'm just documenting it here in case it helps anybody else, as well as for posterity — i.e., when I run into the same problem again in six months, this will show up when I Google it, hehe!)
Anyway, I'm used to doing, e.g.:
# Run a command and save its output:
bash% someCmd > /tmp/out.1
# Then making some changes and re-running:
bash% someCmd > /tmp/out.2
# So I can:
bash% diff /tmp/out.{1,2}
# Which was fine until I ran:
bash% less /tmp/out.1
As a Minimal Reproducible Example, say I have two files named, e.g., /tmp/one.1 and /tmp/two.2:
bash% printf '\033[31mRed\033[m\n' > /tmp/one.1
bash% od -c /tmp/one.1
0000000 033 [ 3 1 m R e d 033 [ m \n
0000014
### Prepend a backslash...
bash% printf '\\\033[31mRed\033[m\n' > /tmp/two.2
bash% od -c /tmp/two.2
0000000 \ 033 [ 3 1 m R e d 033 [ m \n
0000015
Note that /tmp/two.2 is the same as /tmp/one.1 except it has a preceding \ backslash before the escape character...
Now, If I run:
### Sanitize environment...
bash% unset BAT_STYLE BAT_THEME; export BAT_CONFIG_PATH=/dev/null
bash% cat /tmp/one.1 | bat # Works
───────┬──────────────────────────────────
│ STDIN
───────┼──────────────────────────────────
1 │ Red
───────┴──────────────────────────────────
bash% cat /tmp/two.2 | bat # Works
───────┬──────────────────────────────────
│ STDIN
───────┼──────────────────────────────────
1 │ \Red
───────┴──────────────────────────────────
bash% bat /tmp/one.1 # Works
───────┬──────────────────────────────────
│ File: /tmp/one.1
───────┼──────────────────────────────────
1 │ Red
───────┴──────────────────────────────────
bash% bat /tmp/two.2 # Not What I Was Expecting!
───────┬──────────────────────────────────
│ File: /tmp/two.2
───────┼──────────────────────────────────
1 │ \[0m[31mRed
───────┴──────────────────────────────────
### However...
bash% bat -l txt /tmp/two.2 # Works
───────┬──────────────────────────────────
│ File: /tmp/two.2
───────┼──────────────────────────────────
1 │ \Red
───────┴──────────────────────────────────
### And...
bash% cat /tmp/one.1 | bat -l troff # Gives The Funny Output 💡
───────┬──────────────────────────────────
│ File: STDIN
───────┼──────────────────────────────────
1 │ \[0m[31mRed
───────┴──────────────────────────────────
So my mistake was using .<digit> for something other than "nroff -man" files! «grin»
PS — I will add that nroff -man /usr/share/man/man1/bash.1 | bat -l man gives me some funny:
SEE ALSO
Bash Reference Manual, Brian Fox and Chet Ramey
The Gnu Readline Library, Brian Fox and Chet Ramey
The Gnu History Library, Brian Fox and Chet Ramey
Portable Operating System Interface [0m4m(POSIX) Part 2: Shell and Utili‐
ties, IEEE
sh[0m24m(1), ksh[0m24m(1), csh[0m24m(1)
_______emacs[0m24m(1), vi[0m24m(1)
_______readline[0m24m(3)
output under macOS... mandoc does better there, but only colors "SEE" instead of "SEE ALSO" — and the latter does *not* like the x^Hx pseudo-bold hack! — but I don't trust my understanding of -l man to know whether or not *I'm* the one doing it wrong... again! :-}
PS — I will add that
nroff -man /usr/share/man/man1/bash.1 | bat -l mangives me some funny:SEE ALSO Bash Reference Manual, Brian Fox and Chet Ramey The Gnu Readline Library, Brian Fox and Chet Ramey The Gnu History Library, Brian Fox and Chet Ramey Portable Operating System Interface [0m4m(POSIX) Part 2: Shell and Utili‐ ties, IEEE sh[0m24m(1), ksh[0m24m(1), csh[0m24m(1) _______emacs[0m24m(1), vi[0m24m(1) _______readline[0m24m(3)
I did some investigating a week ago, and it appears the man/nroff/groff implementation used by most Linux distros has switched to emitting ANSI escape sequences by default instead of overtyping (the pseudo-bold hack).
The man syntax definition doesn't handle ANSI escape sequences and bat's ANSI parsing doesn't work across highlighting regions, which is likely why you're encountering broken sequences. You'll want to pass -c to nroff to have it revert back to using overtyping.
output under macOS...
mandocdoes better there, but only colors "SEE" instead of "SEE ALSO" — and the latter does *not* like thex^Hxpseudo-bold hack! — but I don't trust my understanding of-l manto know whether or not *I'm* the one doing it wrong... again! :-}
MacOS's mandoc still uses overtyping by default, which is why it behaves a bit better there. I'm on my phone, so I can't test this myself, but try piping into col -bx before piping to bat. That will remove the overtyping, which should help determine if your issue is caused by the backspace character.