w3m
w3m copied to clipboard
w3mman does not render ansi escape sequences on redhat based distributions
OS: Fedora 34 package: w3m-0.5.3-50.git20210102.fc34.x86_64
Man pages add ansi escape sequences for bold
$ PAGER='cat -A' /usr/bin/man bash
BASH(1) General Commands Manual BASH(1)$
$
^[[1mNAME^[[0m$
bash - GNU Bourne-Again SHell$
$
^[[1mSYNOPSIS^[[0m$
^[[1mbash ^[[22m[options] [command_string | file]$
$
^[[1mCOPYRIGHT^[[0m$
Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.$
$
^[[1mDESCRIPTION^[[0m$
...
On bash it shows correctly:
$ /usr/bin/man bash
BASH(1) General Commands Manual BASH(1)
NAME
bash - GNU Bourne-Again SHell
SYNOPSIS
bash [options] [command_string | file]
COPYRIGHT
Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.
DESCRIPTION
...
But w3mman do not render those correctly:
$ /usr/bin/w3mman bash
BASH(1) General Commands Manual BASH(1)
[1mNAME [0m
bash - GNU Bourne-Again SHell
[1mSYNOPSIS [0m
[1mbash [22m[options] [command_string | file]
[1mCOPYRIGHT [0m
Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.
[1mDESCRIPTION [0m
...
Any settings I'm missing? I see this working correctly on arch linux
Please change the title to w3mman does not render ansi escape sequences on redhat based distributions
That's a more accurate description of what's going on.
/usr/local/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html
generates proper html on other platforms, but on redhat and friends the resulting output contains escape codes.
I compiled man-db like it's compiled on arch, and I get exactly the same problem...
Arch linux.
/usr/lib/w3m/cgi-bin/w3mman2html.cgi man >man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1) Utilidades de paginador del manual MAN(1)$
$
<b>NOMBRE</b>$
man - interfaz de los manuales de referencia del sistema$
Fedora:
/usr/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1) Utilidades del paginador del manual MAN(1)$
$
^[[1mNOMBRE^[[0m$
man - interfaz de los manuales de referencia del sistema$
I started adding substitute commands:
diff --git i/w3mman2html.cgi w/w3mman2html.cgi
index b121470..0fa90f5 100755
--- i/w3mman2html.cgi
+++ w/w3mman2html.cgi
@@ -162,7 +162,15 @@ EOF
next;
}
- s@[1m(\w+)[0m$@<b>$1</b>@g;
+ my $printchar='[\wÁÉÍÓÚáéíóú /\'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]';
+ s@[1m($printchar+)[0m@<b>$1</b>@g;
+ s@[4m($printchar+)[24m@<u>$1</u>@g;
+ s@[1m($printchar+)[0m@<b>$1</b>@g;
+ s@[1m($printchar+)[22m@<b>$1</b>@g;
+ s@[1m($printchar+)[4m@<b>$1</b>@g;
+ s@[22m($printchar+)[0m@<u>$1</u>@g;
+ s@[22m($printchar+)[24m@<u>$1</u>@g;
+ s@[4m([\wÁÉÍÓÚáéíóú /'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]+)[0m@<u>$1</u>@g;
s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a href="mailto:$2">$1$2</a>@g;
s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . &file_ref($2)@ge;
This almost do it. I test with man bash and still there are some errors. Basically we need anything that is a character. instead of all that
[\wÁÉÍÓÚáéíóú /'.:;,&()\"~=%*$?|!#`@{}<>_-]
Please commit #238 Thanks!
I have found one still, [34m
in dbus-run-session(1)
On Sat, Dec 24, 2022 at 11:18:08AM -0800, Han Boetes wrote:
I have found one still,
[34m
indbus-run-session(1)
This sounds like we want a test for this. I'm not into the topic, atm, but maybe something like how we test entities.
So finally setting missed is
GROFF_NO_SGR=1
https://github.com/tats/w3m/compare/8891eab5b55647d8f2ab5a8dd9754c660200c280...760d7ad7295bb762a7bef3f5dc17b58278a06ac4
Wonder if we should reopen this and consider a parameter to configure depending on distro. Or this just force same behaviour in all distros?
I assume adding GROFF_NO_SGR=1 has no problem with
- groff >=1.18 with Debian default
- groff >=1.18 default
- groff <1.18, or
- non-groff.
I don't assume SGR is forcely enabled even when GROFF_NO_SGR=1.
Anyway, if you really found a problem, please reopen.