Template2 icon indicating copy to clipboard operation
Template2 copied to clipboard

The join vmethod for lists produces garbled output if its argument is utf8

Open redneb opened this issue 6 years ago • 1 comments

Consider the following template file (call it test.tmpl):

[% a=['φοο','βαρ']; a.join('•') %]
[% a=['foo','bar']; a.join('•') %]

and the following perl script that uses the above template:

#!/usr/bin/perl
use Template;
binmode STDOUT,'utf8';
Template->new(ENCODING=>'utf8')->process('test.tmpl');

The script produces the following output:

φοο•βαρ
fooâ¢bar

As you can see, in the second line the character gets garbled while in the first line is ok. If I define my own custom list vmethod that simply calls perl's join I get the correct output, i.e.:

φοο•βαρ
foo•bar

redneb avatar Nov 28 '17 13:11 redneb

This only happens when using Template::Stash::XS, not Template::Stash (the bug is in the C code); it applies to any join characters immediately after any non-SvUTF8 flagged string until a SvUTF8 flagged string, so e.g. [% a=['foo','φοο','βαρ']; a.join('•') %] becomes foo•φοο•βαρ – the second join is fine because by then the string has had the SvUTF8 flag turned on.

The code at issue is https://github.com/abw/Template2/blob/4c602d0b9577ff87172a420607663cdb72146211/xs/Stash.xs#L1028-L1058 – I'm not an XS expert by any means, but I assume if the join string wasn't switched to a char* it wouldn't lose its UTF8 state. (It looks like perl 5.16 added a new flag to make this sort of thing easier: sv_catpvn_flags takes a couple of new internal-only flags, SV_CATBYTES and SV_CATUTF8 , which tell it whether the char array to be concatenated is UTF8. This allows for more efficient concatenation than creating temporary SVs to pass to sv_catsv . – dunno if that can be used here if it supports older perls but something like that.)

dracos avatar Jan 02 '18 15:01 dracos