w3m
w3m copied to clipboard
Turn ansi escape sequences into html tags
fixes #201
Much better! Thanks!
I just setup a new laptop and apply this patch again.
I found another corner case with ^ symbols, so PR is updated.
Anything missing to apply it? just check with w3mman bash
to see the result
In w3mman bash I spotted this problem:
Compound Commands
[1m[[ [4mexpression [1m]] [0m
With your latest version. We'll get there. :-)
Can you explain how you "translate" a missed character into a w3mman2html.cgi entry? It would be a shame if this fix does not get implemented.
Let me check your last gotcha:
Compound Commands
[1m[[ [4mexpression [1m]] [0m
Looks I forget to add [ and ] to possible inside chars.
It should be fixed now.
Basically I use:
/usr/bin/man bash
and compare with
w3mman bash
Basically the code replaces any number of characters from printchar, surrounded by these ansi escape sequences into HTML tags.
Anyway check what is different in some fedora distros to cause this would be a better solutions
Thanks! I found a few more.
So [0m [1m [4m and [22m
% w3mman bash|grep '\[.m'
-c If the -c option is present, then commands are read from the first non-option argument command_string. If there are arguments after the [4mcom‐ [0m
[1m[-+]O [ [4mshopt_option [1m] [0m
value of that option; [1m+O [22munsets it. If shopt_option is not supplied, the names and values of the shell options accepted by shopt are printed on the
standard output. If the invocation option is [1m+O [22m, the output is displayed in a format that may be reused as input.
[1m! case coproc do done elif else esac fi for function if in select then until while { } time [[ ]] [0m
[1m[[ [4mexpression [1m]] [0m
low under CONDITIONAL EXPRESSIONS. Word splitting and pathname expansion are not performed on the words between the [1m[[ [22mand [1m]] [22m; tilde expan
Symbol +
is not under \w
. Should be fixed now.
I see other paths to fix with:
w3mman bash|grep '\[[^-]m'
working on it
Added more symbols like \
. Just found I'm fixing my locale all accents and ñ, so probably accents part need a better regex to deal with all languages
Thanks for the updates, almost there.
Since it happens at the ends of lines I suspect it has something to do with the line-breaks. This is with
COLUMNS=80 w3mman bash
--rcfile file
Execute commands from file instead of the standard personal ini‐
tialization file ~/.bashrc if the shell is interactive (see [1mIN‐ [0m
VOCATION below).
It also happens if you don't set COLUMNS, but isn't as visible, since it happens in the wrapped line. Setting COLUMNS makes it stand out.
Ah yes, wrapped texto do not includ new line, fixing it
See if columns create a splitted word man is wrapped with start and end sequence: here \0x27[1m
\0x27[0m
that pattern is here:
https://github.com/tats/w3m/pull/238/files#diff-7bd451f4ef63311cbda7ddcbbae207707823c3892c19ab25c3daec3e9bf093e4R166
so I think word splitted are correctly covered.
I tested and works on my side, can you try again:
The diff hasn't changed, and I still see the same problem.
% echo $COLUMNS
80
--rcfile file
Execute commands from file instead of the standard personal ini‐
tialization file ~/.bashrc if the shell is interactive (see [1mI
VOCATION below).
Yes for me It works. I can only think the missing symbol is that lower dash, as you can see It is for me -. I added _ previously but yours looks small, have to check what unicode that is
A fair point, whilst using env COLUMNS=80 LC_ALL=C w3mman bash
the output is clean indeed.
The UTF-8 char is: ‐
Here is the hexl output:
87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef
00000000: e280 900a 2d0a 0ae2 8090 0a2d 0a ....-......-.
So 0a
is a LF
, 2d
is the normal -
, and our UTF char is e28090
which is …drum roll… U+2010 ‐ e2 80 90 HYPHEN
Does that help?
cool I think know we have a solution that works for any char. anything that is not an escape.
Let me know if that works now
It looks like it should, much appreciated!
Added option for
s@^[\[34m^[\[1m($printchar+)^[\[0m@<u><b>$1</b></u>@g;
Yet another one bites the dust. 😊
Merged, thanks for your contribution.
I've found another gem in maildirmake(1) from the maildrop package:
\-q \fIquota\fR
.RS 4
install a quota on the maildir\&. See
\m[blue]\fB\fBmaildirquota\fR(7)\fR\m[]\&\s-2\u[1]\d\s+2
for more information\&.
Which results in:
-q quota
install a quota on the maildir. See [34mmaildirquota(7) [0m[1] for more
information.
This is problematic because currently nested syntax is not allowed:
[34m [1mmaildirquota [22m(7) [0m[1]
There's a line for [34 [0m
and another for [1 [22m
, but [34
stops at first escape. Ned to find a different way to parse this, probably check what nested escape sequences are valid
Fixed by setting GROFF_NO_SGR.
Note that Debian disable the use of SGR escape sequences by default. cf. man grotty.
Looking much better, thanks!
-q quota
install a quota on the maildir. See maildirquota(7)[1] for more
information.
So probably that invalidates all need for the merged changes on cgi?
Reverted this pull request. cf. https://github.com/tats/w3m/compare/8891eab5b55647d8f2ab5a8dd9754c660200c280...760d7ad7295bb762a7bef3f5dc17b58278a06ac4