mustache.sh
mustache.sh copied to clipboard
Another Encoding problem
per my last patch mustache.sh should be Encoding agnostic, alas:
the newline-detection trickery done with sed do fail when there is a non-unicode accent just before the newline:
supposing, for example, iso-8859-1
echo 'A=é
b=c' | mustache
gives us
A=éb=c
the bug does not occur in unicode.
The bug comes from the
sed -r "
s/./&\\n/g
s/\\\\/\\\\\\\\/g
"
as illustrated by doing (with iso-8859-1)
echo 'A=é
b=c' | sed -r "
s/./&\\n/g
s/\\\\/\\\\\\\\/g
"
I poked around a bit with iconv
(1) (character sets are mysterious beasts and my shell loves UTF-8) and I think this confirms the issue. Note the extra 0a
in the UTF-8 version:
$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t UTF-8 | sed -r "
> s/./&\\n/g
> s/\\\\/\\\\\\\\/g
> " | hd
00000000 41 0a 3d 0a c3 a9 0a 0a 42 0a 3d 0a 63 0a 0a |A.=.....B.=.c..|
0000000f
$ printf 'A=é\nB=c\n' | iconv -f UTF-8 -t ISO_8859-1 | sed -r "
s/./&\\n/g
s/\\\\/\\\\\\\\/g
" | hd
00000000 41 0a 3d 0a e9 0a 42 0a 3d 0a 63 0a 0a |A.=...B.=.c..|
0000000d
$
I am not sure how to fix this, unfortunately.
ostensibly it is a sed bug. To do a bug report to sed I'd have to join their mailing list (not happening)
My current workaround is to replace the first sed usage by perl
perl -pe 's/([^\n])/\1\n/sg' | sed -r "s/\\\\/\\\\\\\\/g" | _mustache
but that change makes my mustache.sh not really .sh anymore
horrible, but works :/ I'll just live with that