bat icon indicating copy to clipboard operation
bat copied to clipboard

`man` syntax doesn't highlight bold functions correctly

Open LunarLambda opened this issue 5 years ago • 33 comments

Terminals tested: alacritty, mate-terminal, urxvt

bat --version: 0.12.0 (Installed via cargo install bat)

$MANPAGER: bat --paging=never -pl man [1] [2]

[1]: I disabled paging to make sure it's not a problem with less(1). [2]: The documentation suggests setting MANPAGER to sh -c "col -b | bat -pl man" however I found using col actually just garbled the output even more, see screenshot further down.

Output with MANPAGER='bat -pl man' image

Output with MANPAGER='' or MANPAGER='less' image

The issue seems to be with highlighting functions / page references (foo(...)) when bold output is used.

When using col -b as suggested, it becomes even worse:

Output with MANPAGER='sh -c "col -b | bat -pl man"' image

LunarLambda avatar Sep 05 '19 14:09 LunarLambda

Thank you for the detailed bug report!

I'm going to assume that you are using man sprintf in your examples(?).

To figure out what's going on in detail, we can actually use bat -A to show what exactly man outputs:

MANPAGER="bat -A" man sprintf

After finding the corresponding section, we can take a look at how man prints bold text. It is both fascinating and infuriating. Instead of using ANSI escape sequences, it prints

p␈pr␈ri␈in␈nt␈tf␈f

for a bold printf (bat -A shows instead of the \b backspace character). I believe this is how "bold" was done in the times of typewriters. You would hit backspace and then just re-type the same character to give it more weight.

On todays terminal emulators, that doesn't actually work. If you use MANPAGER="" or MANPAGER="cat", no bold text will be shown. To make sure, we can also call

printf "p\bpr\bri\bin\bnt\btf\bf\n"

which will just print printf on the terminal.

Interestingly, less has a special feature that shows such sequences in bold. Quoting from man less: "Also, backspaces which appear between two identical characters are treated specially: the overstruck text is printed using the terminal's hardware boldface capability. Other backspaces are deleted, along with the preceding character". This is why we see a bold face printf, when we call

printf "p\bpr\bri\bin\bnt\btf\bf\n" | less

There is also a similar feature for underlined text:

printf "p\b_r\b_i\b_n\b_t\b_f\b_\n" | less

Back to bat. When I initially played with this, I noticed that these backspace characters were causing problems when intermixed with bats syntax highlighting. Imagine we have

int printf(const char* format, ...);

in a man page and the whole line is printed in bold (beginning of man sprintf). The syntax highlighter will try to highlight certain special characters like the opening parenthesis (. However, that breaks the backspace-for-bold-font-trick and actual backspace characters will start appearing in your output.

For this reason, I originally used col -b (col --no-backspaces), which turns something like "p\bpr\bri\bin\bnt\btf\bf into printf:

▶ printf "p\bpr\bri\bin\bnt\btf\bf\n" | bat -Ap         
p␈pr␈ri␈in␈nt␈tf␈f␊

▶ printf "p\bpr\bri\bin\bnt\btf\bf\n" | col -b | bat -Ap
printf␊

Unfortunately, I missed that col -b "also replaces any whitespace characters with tabs where possible". This is what breaks the table layout in the above example. Fortunately, we can switch this off via cols -x/--spaces option.

The following works for me:

MANPAGER="sh -c 'col -bx | bat -p -lman'" man sprintf

image

I think we should update the instructions in the README to suggest col -bx.

Unfortunately, it looks like your col command does things a little differently. I couldn't exactly reproduce your screenshots above. My version is:

▶ col --version 
col from util-linux 2.34

sharkdp avatar Sep 06 '19 20:09 sharkdp

I have col from util-linux 2.33.2.

Unfortunately MANPAGER='sh -c "col -bx | bat -plman"' man sprintf yields the following

image

LunarLambda avatar Sep 06 '19 20:09 LunarLambda

In this case, it does not seem like col is the problem. Could you please post the output of alias bat and the output of the following bash script?

set -x

bat --version
bat --config-file
bat --cache-dir
less --version

bat "$(bat --config-file)"
ls "$(bat --cache-dir)"

set +x

echo "BAT_PAGER = '$BAT_PAGER'"
echo "BAT_CONFIG_PATH = '$BAT_CONFIG_PATH'"
echo "BAT_STYLE = '$BAT_STYLE'"
echo "BAT_THEME = '$BAT_THEME'"
echo "BAT_TABS = '$BAT_TABS'"
echo "PAGER = '$PAGER'"
echo "LESS = '$LESS'"

sharkdp avatar Sep 06 '19 20:09 sharkdp

++ alias bat
bash: alias: bat: not found
++ bat --version
bat 0.11.0
++ bat --config-file
/home/luna/.config/bat/config
++ bat --cache-dir
/home/luna/.cache/bat
++ less --version
less 551 (POSIX regular expressions)
Copyright (C) 1984-2019  Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: http://www.greenwoodsoftware.com/less
+++ bat --config-file
++ bat /home/luna/.config/bat/config
[bat error]: '/home/luna/.config/bat/config': No such file or directory (os error 2)
+++ bat --cache-dir
++ ls --color=auto /home/luna/.cache/bat
ls: cannot access '/home/luna/.cache/bat': No such file or directory
++ set +x
BAT_PAGER = ''
BAT_CONFIG_PATH = ''
BAT_STYLE = ''
BAT_THEME = ''
BAT_TABS = ''
PAGER = ''
LESS = ''

LunarLambda avatar Sep 06 '19 20:09 LunarLambda

Hm, nothing unusual there.

It would be great if you could show two other screenshots:

One for:

MANPAGER='sh -c "col -bx | bat -plman --color=never"' man sprintf

and one for

MANPAGER='sh -c "col -bx | bat -Ap"' man sprintf

sharkdp avatar Sep 06 '19 20:09 sharkdp

1: image

2: image

These are once again using alacritty, but I got the same results with various vte-based terminals (gnome-terminal, etc), and urxvt.

LunarLambda avatar Sep 06 '19 20:09 LunarLambda

I've got an idea. What does type man or which man say for you? Is it calling /usr/bin/man or is it some shell function wrapping the real man (and possibly trying to add some colors itself)?

sharkdp avatar Sep 06 '19 21:09 sharkdp

/usr/bin/man, nothing special here.

I'm using Zsh, but little to no configuration (no oh-my-zsh, any aliases replacing commands, etc...)

file $(which man) reports a ELF exe, so no wrapper script there either.

LunarLambda avatar Sep 06 '19 21:09 LunarLambda

Okay. So the output is definitely already messed up when it reaches bat (messed up = contains parts of ANSI escape sequences like 1m, 24m etc.). It could be either man itself (does MANPAGER="" man sprintf show colors for you?) or col -bx.

If col is the problem, you could check the output of

MANPAGER="bat -Ap" man sprintf

directly. It should contain plenty of backspace characters, but no ANSI escape sequences.

Thank you very much for following along!

sharkdp avatar Sep 06 '19 21:09 sharkdp

MANPAGER="" man sprintf shows bold and underline text (no pager though)

MANPAGER="bat -Ap" man sprintf shows this... image

Oh thank you for taking on the issue, bat has become an inexpendable tool for me (so much so I have an alias b='bat -pn', haha)

LunarLambda avatar Sep 06 '19 21:09 LunarLambda

I also ran it with MANPAGER="cat -A"

Plenty of ansi sequences, but no backspaces, very weird...

^[[1m -> bold on ^[[0m -> bold off ^[[4m -> underline on ^[[24m -> underline off ^[[22m -> color off/bold off

image

LunarLambda avatar Sep 06 '19 21:09 LunarLambda

Ok. It looks like your version of man actually uses ANSI escape sequences already.

It might be worth going through man man or man --help to see if there is anything to turn this off. Might also be worth to check the values of man-related environment variables (eg MANOPT).

sharkdp avatar Sep 06 '19 21:09 sharkdp

man itself has no such option.

Using a very hacky strace oneliner I got the execution chain for a man invocation. One of these programs will probably have an option for it, however I can't actually find anything right now...

image

LunarLambda avatar Sep 06 '19 21:09 LunarLambda

grotty can use the old format (using backspaces) by passing the -c option or setting GROFF_NO_SGR

grotty -c -b -u would use the old format (no SGR sequences), and supresses overstriking and underlining for bold/italic respectively. However, I have no clue how to propagate that option through the entire chain short of writing a wrapper script around grotty...

Perhaps just being able to pass -c would be enough.

LunarLambda avatar Sep 06 '19 22:09 LunarLambda

Hm. We could try to remove ANSI codes from the output (instead of using col -bx). See this page, for example. It won't be pretty :smile:

Might make sense to move this to a separate script that can be used as MANPAGER.

In the future, we could potentially also try to find a proper/better solution by pre-processing within bat.

sharkdp avatar Sep 06 '19 22:09 sharkdp

Well.

MANROFFOPT="-c" MANPAGER="sh -c 'col -bx | bat -plman'" man sprintf Finally worked. No bold or underlined text, but it finally displays correctly :D

While this presents a working solution for now, I'd suggest either keeping this issue open, or opening a new one, as this is rather hacky. (although it was fun learning experience about the joys of old unix tech!)

image

LunarLambda avatar Sep 06 '19 22:09 LunarLambda

I'd like to close this. It is now described in the README, and I currently don't see a better solution.

sharkdp avatar Oct 15 '19 19:10 sharkdp

Understandable ^^

LunarLambda avatar Oct 15 '19 23:10 LunarLambda

You should mention in the README that bold highlighting is unsupported - I was quite confused, and this issue doesn't really go into that.

xeruf avatar Jun 21 '20 12:06 xeruf

Seriously? This issue "doesn't really go into that"? We have spent hours to debug this and have written extremely detailed comments that document everything.

You should mention in the README that bold highlighting is unsupported

Nobody "should" do anything here, but I agree that it's probably a good idea to add that. Contributions to the documentation are always welcome.

sharkdp avatar Jun 22 '20 20:06 sharkdp

Hey, sorry if that was phrased unappreciative. I did read the comments and it was quite informative, but to me seemed mostly concerned with the problems of the control characters used for boldness messing up the output.

What I was wondering is whether this could actually be changed to interpret boldness. I am writing a man page myself and would like to see it as the end users see it, so I currently have to use less, but much prefer the overall look of bat :)

xeruf avatar Jun 23 '20 11:06 xeruf

I'm going to reopen this, as there might actually be a way to solve this, if we write a man preprocessor within bat.

sharkdp avatar Jul 25 '20 20:07 sharkdp

I ran into this as well using Windows Terminal with bat as a man pager. The settings recommended by @LunarLambda in https://github.com/sharkdp/bat/issues/652#issuecomment-529032263 resolved my problem. 👍

damien avatar Nov 18 '20 23:11 damien

Program versions

Arch Linux man 2.9.4 col from util-linux 2.37 bat 0.18.1

Comparison

MANPAGER='less' man printf

image

MANPAGER='bat -pl man' man printf

image

MANPAGER="sh -c 'col -bx | bat -pl man'" man printf

image

Neither MANROFFOPT nor adding/removing -b for col seem to change anything for me.

Conclusion

Adding colors is nice, but since bat right now does not display the essential highlightings, I am considering to switch back to less or find an interactive man viewer where I can follow links.

xeruf avatar Jul 01 '21 09:07 xeruf

for me, working on fedora 35 export MANROFFOPT="-c" helped Thankyou @xeruf @LunarLambda

avimehenwal avatar Nov 13 '21 18:11 avimehenwal

for me, working on fedora 35 export MANROFFOPT="-c" helped Thankyou @xeruf @LunarLambda

Same here, maybe could be added to README?

leppaott avatar Feb 04 '22 12:02 leppaott

I've done a little more digging into this, as I have one Linux system and macOS where I'm running into this. Ideally, both color and bold/underline would be output as ANSI codes and bat would happily interpret them, but groff appears to still be generating X^HX even when it also uses color output!

It seems that in Debian it might be possible to achieve with GROFF_SGR=1 or editing /etc/groff files: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750202 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=963490

So far I have not found a working option for macOS or the CentOS system where I'm still seeing the issue, but I'm working on trying an option to "dumb replace" them, something like

# doesn't work quite right...
MANPAGER="sed -r 's/(.)\x08\1/\033[1m\1\033[0m/g' | bat -plman"

This StackExchange also seems to have lots of relevant details here, which makes it seem like lots of the options here are distro-dependent unfortunately... Maybe preprocessing in bat really would make it simpler 😢

Edit: one more resource explaining some different behavior on Arch (where everything seems to work... better? differently? for me at least) and Debian

ian-h-chamberlain avatar Jan 06 '23 17:01 ian-h-chamberlain

update from my side: Using nvim/emacs as man viewer now as these can follow links as well ;)

xeruf avatar Jan 10 '23 00:01 xeruf

Okay, phew! I dug in a little more and got a usable sed command, but unfortunately there still seems to be an issue with --language Manpage even using ANSI codes instead of overstrike.

Here's the command I'm using:

sed=gsed # needed on macOS it seemzs
# sed=sed # linux

export MANPAGER="$sed -E 's/(.)\x08\1/\x1b[1m\1\x1b[22m/g' |
	$sed -E 's/_\x08(.)/\x1b[4m\1\x1b[24m/g' |
	bat -p"
man sprintf

This displays non-colored but correctly decorated pages, as you might expect! less, cat etc. should also work here.

Screen Shot 2023-01-13 at 09 30 55

However, when using bat --language Manpage, it seems the color of the syntax highlight gets garbled with the bold/underline codes, similar to the OP report:

export MANPAGER="$sed -E 's/(.)\x08\1/\x1b[1m\1\x1b[22m/g' | 
	$sed -E 's/_\x08(.)/\x1b[4m\1\x1b[24m/g' |
	bat -plman"
man sprintf

Screen Shot 2023-01-13 at 09 29 54

Is it expected that bat would correctly handle the syntax highlighting intermingled with the source data having control characters? If so, I'd propose that as the actionable item here, and have it be the user's responsibility to ensure the input manpage data is "normalized" (i.e. using all ANSI or all overstrike decorations). Thoughts?

ian-h-chamberlain avatar Jan 13 '23 14:01 ian-h-chamberlain

On macOS this happens if you use the man binary provided by brew's man-db package. I don't remember why I added it, so brew uninstall man-db brought me back to using the system man implementation, which is more well-behaved about escape sequences.

Not sure if that's viable for anybody else, but removing it was a huge QoL improvement for me (back to bat's highlighting, and no more broken escapes written in my manpages), so I figured I'd mention it here in case someone else in the same situation hits it.

Example of what the brokenness looked like, since it doesn't quite seem the same as the others, although it's basically the same problem.

(Before)

LOCATE(1)                                BSD General Commands Manual                                LOCATE(1)

1mNAME0m
     1mlocate 22m— find filenames quickly

1mSYNOPSIS0m
     1mlocate 22m[1m-0Scims22m] [1m-l 4m22mlimit24m] [1m-d 4m22mdatabase24m] 4mpattern24m 4m...0m

1mDESCRIPTION0m
     The 1mlocate 22mprogram searches a database for all pathnames which match the specified 4mpattern24m.  The data‐

(After)

LOCATE(1)                                   General Commands Manual                                  LOCATE(1)

NAME
     locate – find filenames quickly

SYNOPSIS
     locate [-0Scims] [-l limit] [-d database] pattern ...

DESCRIPTION
     The locate program searches a database for all pathnames which match the specified pattern.  The database
     is recomputed periodically (usually weekly or daily), and contains the pathnames of all files which are
     publicly accessible.

Both had some amount of bat highlighting, but with the extra text it was just unreadable before.

thomcc avatar Aug 03 '23 15:08 thomcc