ansi2html
ansi2html copied to clipboard
Stray characters in html output
trafficstars
Not totally sure what's going on here, but this is diagnostic output from GCC 9:
^[[01m^[[K/path/to/my/file.cpp:127:30:^[[m^[[K ^[[01;35m^[[Kwarning: ^[[m^[[Kcomparison of integer expressions of different signedness: ~@~X^[[01m^[[Kunsgned int^[[m^[[K~@~Y and ~@~X^[[01m^[[Kint^[[m^[[K~@~Y [^[[01;35m^[[K-Wsign-compare^[[m^[[K]
127 | for (unsigned int y = 0; ^[[01;35m^[[Ky < height^[[m^[[K; y++)
| ^[[01;35m^[[K~~^~~~~~~~^[[m^[[K
Which is rendered by ansi2html into:
<span class="ansi1">/path/to/my/file.cpp:127:30:</span> <span class="ansi1 ansi35">warning: </span>comparison of integer expressions of different signedness: â<span class="ansi1">unsigned int</span>â and â<span class="ansi1">int</span>â [<span class="ansi1 ansi35">-Wsign-compare</span>]
127 | for (unsigned int y = 0; <span class="ansi1 ansi35">y < height</span>; y++)
| <span class="ansi1 ansi35">~~^~~~~~~~</span>
Anyone know what those extra â sequences are, and if they can be filtered out somehow?
When trying to reproduce it…
# echo 'int main(int argc, char ** argv) { return argc < (unsigned)argc; }' > main.c
# gcc -Wextra -fdiagnostics-color=always main.c |& ansi2html > gcc.htm
# gcc -dumpversion
11.2.1
# ansi2html --version
ansi2html 1.7.1.dev1 # i.e. Git master
…what I see in the browser is this:

Which looks sane. So I'll need help with reproducing.
PS: Here's what I get for your very example pasted into input.txt. Note the sed call to repair the ANSI on the fly:
# sed $'s,\^\[,\x1b,g' input.txt | ansi2html > gcc.htm
Then in Chromium:
