utf8-all icon indicating copy to clipboard operation
utf8-all copied to clipboard

is there a way to use utf8::all in other 3rd party modules?

Open caileiYanxiu opened this issue 6 years ago • 11 comments

e.g. I am using the Path::Tiny module and that module uses readdir sub. Is there a way to make it to respect of utf8::all without modifying its source code?

currently, i have to modify the code and add use utf8::all right after the declaration of package Path::Tiny

caileiYanxiu avatar Aug 14 '18 08:08 caileiYanxiu

If you use utf8::all in your own code, the changes that are brought by it are (also) active for Path::Tiny (unless it overrides what is overridden by utf8::all again, but that is not the case as far as I can tell).

If this doesn't this work for you, can you show me an example where this doesn't work?

HayoBaan avatar Aug 14 '18 09:08 HayoBaan

My folder structure is like this:

├── test.pl └── 中文目录             └── 中文文件.txt

It outpus correct utf8 filenames with following code (when debug, I saw it enters the _utf8_readdir):

use v5.18;
use utf8::all 'GLOBAL';
opendir DIR, "./中文目录";
for (readdir DIR) {
    say $_;
}
closedir DIR;

I got

.
..
中文文件.txt`

but it went wrong when I try to use Path::Tiny (it never enters the replacement sub for readdir)

use v5.18;
use utf8::all 'GLOBAL';
use Path::Tiny;
my $path = path("./中文目录");
for ( $path->children ) {
    say $_;
}

I got

中文目录/中文文件.txt

As far as I see, the children sub in Path::Tiny doesn't respect of the replacement of readdir. The $target as utf8::all replace subs for is always main unless I modify the code of Path::Tiny to something like below.

# /Library/Perl/5.18/Path/Tiny.pm
.
.
.
package Path::Tiny;
use utf8::all 'global';
.
.
.

caileiYanxiu avatar Aug 14 '18 09:08 caileiYanxiu

If you use utf8::all in your own code, the changes that are brought by it are (also) active for Path::Tiny

Pretty sure it isn't; the only global effects are utf8ness STDIN/STDOUT/STDERR and @ARGV. Making it global would be possible, but that kind of monkey patching is rarely a good idea.

Leont avatar Aug 14 '18 09:08 Leont

@Leont , so what's the best way to make 3rd party package to respect of utf8::all ? It annoys to have to modify code from cpan only to enable the utf8 feature.

caileiYanxiu avatar Aug 14 '18 10:08 caileiYanxiu

Hmm, @Leont may have a point; while, readdir is overridden (just like readlink and glob), it may be that it is limited to the current module. I'll have a look at the issue and see if I can come up with something that works and isn't too hard to implement in your own code.

HayoBaan avatar Aug 14 '18 10:08 HayoBaan

I've been trying a couple of things (e.g. using Import::Into, an eval construct, overriding the readdir function manually, etc.), but I can't get it to work ☹️. @Leont do you have a suggestion?

HayoBaan avatar Aug 14 '18 12:08 HayoBaan

I've been trying a couple of things (e.g. using Import::Into, an eval construct, overriding the readdir function manually, etc.), but I can't get it to work :frowning_face:

utf8 readdir is guarded by the lexical pragma, and since the scope that calls it (inside Path::Tiny) doesn't have it enabled the override will be a noop. This is intentional.

Leont avatar Aug 16 '18 00:08 Leont

TinyJ.pm.txt

eg. I could solve it if I can rewrite PATH :: Tiny to NewTiny. FYI. I am glad if it is helpful, I am sorry if it is out of order.

test_exe utf8_test.pl

#!/usr/bin/perl -W
# -C64
#@ test utf-8 chinese japanese
use strict;
use v5.26;
use PathJ::TinyJ;
# use Path::Tiny;
use utf8::all 'GLOBAL';
#use Encode::JP;
# use locale;
# use utf8;

my $path = path("./中文目录");

for ( $path->children ) {
    say $_;
   }

__END__

Path :: Tiny to PathJ :: TinyJ file PathJ/TinyJ.pm

use 5.008001;
use strict;
use warnings;
use utf8::all;
package PathJ::TinyJ;
# ABSTRACT: File path utility

our $VERSION = '0.104J';#JAPANESE


sub children {
    my ( $self, $filter ) = @_;
    my $dh;


    opendir $dh, $self->[PATH] or $self->_throw('opendir');
    my @children = readdir $dh;
    closedir $dh or $self->_throw('closedir');

    use Encode;#★★★★★★★★★★★★変更 add
    @children = map { decode('utf-8',$_ ) } @children;#★★★★★★★★★★★★変更 add

    if ( not defined $filter ) {

hitobashira avatar Aug 17 '18 03:08 hitobashira

New TinyJ.pm $ ./utf8_test.pl 中文目录/中文文件.txt 中文目录/日本語文章.txt

Normal Tiny.pm valid pathname, invalid broken basename. $ ./utf8_test.pl 中文目录/中æ‡æ‡ä»¶.txt 中文目录/æ¥æœ¬èªžæ‡ç« .txt

hitobashira avatar Aug 17 '18 04:08 hitobashira

@hitobashira , the source code of utf8::all replace the readdir only for calling module which is "main" package, most of the time.

actully, if you just add one line in Path::Tiny to tell it to using utf8::all, then problem solved.

But, it will be so annoying to see find out if a 3rd party code uses readdir or so, and have to modify them in order to respect of utf8::all.

so my question is, is there a way to provide api support within utf8::all, to turn on / off utf8 support as a real "global" vision. Currently, the use utf8::all 'global', is not global actually, which is confused.

caileiYanxiu avatar Aug 17 '18 05:08 caileiYanxiu

@caileiYanxiu I certainly am happy if Aladdin's magic lamp on handling UTF-8 in Perl 5 is available.

hitobashira avatar Aug 17 '18 17:08 hitobashira