Template2 icon indicating copy to clipboard operation
Template2 copied to clipboard

use locale in template::filters [rt.cpan.org #119992]

Open atoomic opened this issue 6 years ago • 8 comments

Migrated from rt.cpan.org#119992 (status was 'new')

Requestors:

From [email protected] on 2017-01-26 08:02:45:

Seeing warnings like

Wide character (U+434) in substitution (s///) at /usr/local/lib/x86_64-linux-gnu/perl/5.22.1/Template/Filters.pm line 62

Perldiag tells:

====
Wide character in %s

(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8' . Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and binmode.
====

If I remove 'use locale' from T::F, warning gone.

So why is it there and how to use it with unicode strings?

atoomic avatar Oct 05 '18 16:10 atoomic

Hi I don't have any example code here but the issue is that you're not doing the following, adjusting STDOUT to whatever your file handle is.

binmode(STDOUT, ":utf8");

toddr avatar Oct 08 '18 23:10 toddr

Please re-open this issue, it is still actual.

No, setting binmode to utf8 binmode(STDOUT, ":utf8"); does not help. But removing the use locale line from T::F code does help.

Since locales are more a form of nationalization than of internationalization, the use of locales may interact oddly with >Unicode (c) O'Reilly & Associates "Programming Perl"

docstore.mik.ua/orelly/perl2/prog/ch31_14.htm

I will provide PoC if needed

@toddr

evengar2008 avatar Oct 18 '21 09:10 evengar2008

note: the use locale was added as part of d56b9a8c43c1876ac655d0d87efc9c459ab5dc65 change

from the changelog:

* Added "use locale" to Template::Filters to enable locale-specific
  filters.

It seems that locale was added to fix: http://rt.cpan.org/Ticket/Display.html?id=9094 http://rt.cpan.org/Ticket/Display.html?id=5695

atoomic avatar Nov 01 '21 20:11 atoomic

@evengar2008 could you please provide steps to reproduce the issue you noticed? thanks

atoomic avatar Nov 01 '21 21:11 atoomic

@evengar2008 can you confirm that this is fixing your issue

diff --git a/lib/Template/Filters.pm b/lib/Template/Filters.pm
index fdd82b85..a0d1de86 100644
--- a/lib/Template/Filters.pm
+++ b/lib/Template/Filters.pm
@@ -57,9 +57,9 @@ our $FILTERS = {
     'ucfirst'         => sub { ucfirst $_[0] },
     'lcfirst'         => sub { lcfirst $_[0] },
     'stderr'          => sub { print STDERR @_; return '' },
-    'trim'            => sub { for ($_[0]) { s/^\s+//; s/\s+$// }; $_[0] },
+    'trim'            => sub { use bytes; for ($_[0]) { s/^\s+//; s/\s+$// }; $_[0] },
     'null'            => sub { return '' },
-    'collapse'        => sub { for ($_[0]) { s/^\s+//; s/\s+$//; s/\s+/ /g };
+    'collapse'        => sub { use bytes; for ($_[0]) { s/^\s+//; s/\s+$//; s/\s+/ /g };
                                $_[0] },

     # dynamic filters

atoomic avatar Nov 01 '21 21:11 atoomic

@evengar2008 can you confirm that this is fixing your issue

diff --git a/lib/Template/Filters.pm b/lib/Template/Filters.pm
index fdd82b85..a0d1de86 100644
--- a/lib/Template/Filters.pm
+++ b/lib/Template/Filters.pm
@@ -57,9 +57,9 @@ our $FILTERS = {
     'ucfirst'         => sub { ucfirst $_[0] },
     'lcfirst'         => sub { lcfirst $_[0] },
     'stderr'          => sub { print STDERR @_; return '' },
-    'trim'            => sub { for ($_[0]) { s/^\s+//; s/\s+$// }; $_[0] },
+    'trim'            => sub { use bytes; for ($_[0]) { s/^\s+//; s/\s+$// }; $_[0] },
     'null'            => sub { return '' },
-    'collapse'        => sub { for ($_[0]) { s/^\s+//; s/\s+$//; s/\s+/ /g };
+    'collapse'        => sub { use bytes; for ($_[0]) { s/^\s+//; s/\s+$//; s/\s+/ /g };
                                $_[0] },

     # dynamic filters

@atoomic

The doc https://perldoc.perl.org/bytes says that "Use of this module for anything other than debugging purposes is strongly discouraged". It does not work well with Unicode and trim with use bytes pragma won't work correctly, e.g. it won't trim unicode spaces. Even though this solution remove some warnings, it is unreliable and not fully compatible with Unicode.

Working in PoC now. Not reproducible under pure TT call, only within Catalyst::View::TT processing

evengar2008 avatar Nov 23 '21 13:11 evengar2008

@atoomic here's the PoC

test.pl

use POSIX qw( setlocale LC_ALL );
use warnings;
use strict;

use utf8;

setlocale( LC_ALL, 'POSIX' );

use Template::Filters;
use Template;


my $tt = Template->new({
    INCLUDE_PATH => 'templates/regru/',
    FILTERS => $Template::Filters::FILTERS,
    INTERPOLATE  => 1,
    ENCODING => 'UTF-8',
}) || die "$Template::ERROR\n";

my $output;
my $text = 'Съешь ещё этих мягких французских булок, да выпей же чаю ';

$tt->process( 'test.inc', { test => $text }, \$output );

test.inc

<p>[% test | trim %]</p>
<p>[% test | ucfirst %]</p>

It seems that if code that sets its own setlocale and imports T::F is not compatible with use locale in T::F. If I remove the use locale pragma from T::F, everything works well

Just re-rechecked the solution with use bytes in T::F::Filters::trim(). It does not work, warning about wide chars in substitution still appear

evengar2008 avatar Nov 23 '21 14:11 evengar2008

The issue is that if we set somewhere else locale to POSIX, then in T::F all string operations with unicode will be affected because of the use locale pragma enabled within T::F.

This can be fixed either by removing use locale pragma from T::F or by switching locale to UTF8 locale in outer scope where T::F is imported.

evengar2008 avatar Nov 24 '21 08:11 evengar2008