mustache.sh icon indicating copy to clipboard operation
mustache.sh copied to clipboard

Another Encoding problem

Open malk opened this issue 11 years ago • 4 comments

per my last patch mustache.sh should be Encoding agnostic, alas:

the newline-detection trickery done with sed do fail when there is a non-unicode accent just before the newline:

supposing, for example, iso-8859-1

echo 'A=é
b=c' | mustache

gives us

A=éb=c

malk avatar Sep 27 '13 14:09 malk

the bug does not occur in unicode.

malk avatar Sep 27 '13 14:09 malk

The bug comes from the

sed -r "
        s/./&\\n/g
        s/\\\\/\\\\\\\\/g
    "

as illustrated by doing (with iso-8859-1)

echo 'A=é
b=c' | sed -r "
        s/./&\\n/g
        s/\\\\/\\\\\\\\/g
    "

malk avatar Sep 27 '13 14:09 malk

I poked around a bit with iconv(1) (character sets are mysterious beasts and my shell loves UTF-8) and I think this confirms the issue. Note the extra 0a in the UTF-8 version:

$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t UTF-8 | sed -r "
> s/./&\\n/g
> s/\\\\/\\\\\\\\/g
> " | hd
00000000  41 0a 3d 0a c3 a9 0a 0a  42 0a 3d 0a 63 0a 0a     |A.=.....B.=.c..|
0000000f
$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t ISO_8859-1 | sed -r "
s/./&\\n/g
s/\\\\/\\\\\\\\/g
" | hd
00000000  41 0a 3d 0a e9 0a 42 0a  3d 0a 63 0a 0a           |A.=...B.=.c..|
0000000d
$

I am not sure how to fix this, unfortunately.

rcrowley avatar Sep 28 '13 23:09 rcrowley

ostensibly it is a sed bug. To do a bug report to sed I'd have to join their mailing list (not happening)

My current workaround is to replace the first sed usage by perl

perl -pe 's/([^\n])/\1\n/sg' | sed -r "s/\\\\/\\\\\\\\/g" | _mustache

but that change makes my mustache.sh not really .sh anymore

horrible, but works :/ I'll just live with that

malk avatar Sep 30 '13 09:09 malk