w3m icon indicating copy to clipboard operation
w3m copied to clipboard

Turn ansi escape sequences into html tags

Open albfan opened this issue 2 years ago • 16 comments

fixes #201

albfan avatar Jun 06 '22 05:06 albfan

Much better! Thanks!

hboetes avatar Jun 11 '22 22:06 hboetes

I just setup a new laptop and apply this patch again.

I found another corner case with ^ symbols, so PR is updated.

Anything missing to apply it? just check with w3mman bash to see the result

albfan avatar Aug 08 '22 10:08 albfan

In w3mman bash I spotted this problem:

Compound Commands
[1m[[  [4mexpression  [1m]] [0m

With your latest version. We'll get there. :-)

hboetes avatar Aug 12 '22 05:08 hboetes

Can you explain how you "translate" a missed character into a w3mman2html.cgi entry? It would be a shame if this fix does not get implemented.

hboetes avatar Sep 22 '22 19:09 hboetes

Let me check your last gotcha:

Compound Commands
[1m[[  [4mexpression  [1m]] [0m

Looks I forget to add [ and ] to possible inside chars.

It should be fixed now.

Basically I use:

/usr/bin/man bash

and compare with

w3mman bash

Basically the code replaces any number of characters from printchar, surrounded by these ansi escape sequences into HTML tags.

Anyway check what is different in some fedora distros to cause this would be a better solutions

albfan avatar Sep 23 '22 08:09 albfan

Thanks! I found a few more.

So [0m [1m [4m and [22m

% w3mman bash|grep '\[.m'
       -c        If the -c option is present, then commands are read from the first non-option  argument  command_string.   If  there  are  arguments  after  the   [4mcom‐ [0m
        [1m[-+]O [ [4mshopt_option [1m] [0m
                 value  of  that option;  [1m+O  [22munsets it.  If shopt_option is not supplied, the names and values of the shell options accepted by shopt are printed on the
                 standard output.  If the invocation option is  [1m+O [22m, the output is displayed in a format that may be reused as input.
        [1m! case  coproc  do done elif else esac fi for function if in select then until while { } time [[ ]] [0m
        [1m[[  [4mexpression  [1m]] [0m
              low  under CONDITIONAL EXPRESSIONS.  Word splitting and pathname expansion are not performed on the words between the  [1m[[  [22mand  [1m]] [22m; tilde expan

hboetes avatar Sep 23 '22 19:09 hboetes

Symbol + is not under \w. Should be fixed now.

I see other paths to fix with:

w3mman bash|grep '\[[^-]m'

working on it

albfan avatar Sep 25 '22 08:09 albfan

Added more symbols like \. Just found I'm fixing my locale all accents and ñ, so probably accents part need a better regex to deal with all languages

albfan avatar Sep 25 '22 08:09 albfan

Thanks for the updates, almost there.

Since it happens at the ends of lines I suspect it has something to do with the line-breaks. This is with

COLUMNS=80 w3mman bash

       --rcfile file
              Execute commands from file instead of the standard personal ini‐
              tialization file ~/.bashrc if the shell is interactive (see   [1mIN‐ [0m
              VOCATION below).

It also happens if you don't set COLUMNS, but isn't as visible, since it happens in the wrapped line. Setting COLUMNS makes it stand out.

hboetes avatar Sep 25 '22 10:09 hboetes

Ah yes, wrapped texto do not includ new line, fixing it

albfan avatar Sep 25 '22 12:09 albfan

See if columns create a splitted word man is wrapped with start and end sequence: here \0x27[1m \0x27[0m

that pattern is here:

https://github.com/tats/w3m/pull/238/files#diff-7bd451f4ef63311cbda7ddcbbae207707823c3892c19ab25c3daec3e9bf093e4R166

so I think word splitted are correctly covered.

I tested and works on my side, can you try again:

Captura desde 2022-09-30 12-50-00

albfan avatar Sep 30 '22 10:09 albfan

The diff hasn't changed, and I still see the same problem.

% echo $COLUMNS 
80
       --rcfile file
              Execute commands from file instead of the standard personal ini‐
              tialization file ~/.bashrc if the shell is interactive (see   [1mI
              VOCATION below).

hboetes avatar Sep 30 '22 15:09 hboetes

Yes for me It works. I can only think the missing symbol is that lower dash, as you can see It is for me -. I added _ previously but yours looks small, have to check what unicode that is

albfan avatar Sep 30 '22 17:09 albfan

A fair point, whilst using env COLUMNS=80 LC_ALL=C w3mman bash the output is clean indeed.

The UTF-8 char is: ‐

Here is the hexl output:

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                                                                                                         
00000000: e280 900a 2d0a 0ae2 8090 0a2d 0a         ....-......-.                                                                                                            

So 0a is a LF, 2d is the normal -, and our UTF char is e28090 which is …drum roll… U+2010 ‐ e2 80 90 HYPHEN

Does that help?

hboetes avatar Sep 30 '22 18:09 hboetes

cool I think know we have a solution that works for any char. anything that is not an escape.

Let me know if that works now

albfan avatar Oct 01 '22 06:10 albfan

It looks like it should, much appreciated!

hboetes avatar Oct 01 '22 10:10 hboetes

Added option for

s@^[\[34m^[\[1m($printchar+)^[\[0m@<u><b>$1</b></u>@g;

albfan avatar Dec 27 '22 10:12 albfan

Yet another one bites the dust. 😊

hboetes avatar Dec 27 '22 17:12 hboetes

Merged, thanks for your contribution.

tats avatar Jan 05 '23 11:01 tats

I've found another gem in maildirmake(1) from the maildrop package:

\-q \fIquota\fR
.RS 4
install a quota on the maildir\&. See
\m[blue]\fB\fBmaildirquota\fR(7)\fR\m[]\&\s-2\u[1]\d\s+2
for more information\&.

Which results in:

       -q quota
           install a quota on the maildir. See  [34mmaildirquota(7) [0m[1] for more
           information.

hboetes avatar Jan 15 '23 15:01 hboetes

This is problematic because currently nested syntax is not allowed:

[34m [1mmaildirquota [22m(7) [0m[1]

There's a line for [34 [0m and another for [1 [22m, but [34 stops at first escape. Ned to find a different way to parse this, probably check what nested escape sequences are valid

albfan avatar Jan 15 '23 17:01 albfan

Fixed by setting GROFF_NO_SGR.

Note that Debian disable the use of SGR escape sequences by default. cf. man grotty.

tats avatar Jan 15 '23 21:01 tats

Looking much better, thanks!

       -q quota
           install a quota on the maildir. See maildirquota(7)[1] for more
           information.

hboetes avatar Jan 16 '23 15:01 hboetes

So probably that invalidates all need for the merged changes on cgi?

albfan avatar Jan 18 '23 09:01 albfan

Reverted this pull request. cf. https://github.com/tats/w3m/compare/8891eab5b55647d8f2ab5a8dd9754c660200c280...760d7ad7295bb762a7bef3f5dc17b58278a06ac4

tats avatar Jan 18 '23 10:01 tats