Template2
Template2 copied to clipboard
The join vmethod for lists produces garbled output if its argument is utf8
Consider the following template file (call it test.tmpl
):
[% a=['φοο','βαρ']; a.join('•') %]
[% a=['foo','bar']; a.join('•') %]
and the following perl script that uses the above template:
#!/usr/bin/perl
use Template;
binmode STDOUT,'utf8';
Template->new(ENCODING=>'utf8')->process('test.tmpl');
The script produces the following output:
φοο•βαρ
fooâ¢bar
As you can see, in the second line the •
character gets garbled while in the first line is ok. If I define my own custom list vmethod that simply calls perl's join
I get the correct output, i.e.:
φοο•βαρ
foo•bar
This only happens when using Template::Stash::XS, not Template::Stash (the bug is in the C code); it applies to any join characters immediately after any non-SvUTF8 flagged string until a SvUTF8 flagged string, so e.g. [% a=['foo','φοο','βαρ']; a.join('•') %]
becomes fooâ¢φοο•βαρ
– the second join is fine because by then the string has had the SvUTF8 flag turned on.
The code at issue is https://github.com/abw/Template2/blob/4c602d0b9577ff87172a420607663cdb72146211/xs/Stash.xs#L1028-L1058 – I'm not an XS expert by any means, but I assume if the join string wasn't switched to a char*
it wouldn't lose its UTF8 state. (It looks like perl 5.16 added a new flag to make this sort of thing easier: sv_catpvn_flags takes a couple of new internal-only flags, SV_CATBYTES and SV_CATUTF8 , which tell it whether the char array to be concatenated is UTF8. This allows for more efficient concatenation than creating temporary SVs to pass to sv_catsv .
– dunno if that can be used here if it supports older perls but something like that.)