pause What to do about Unicode package names

Perl 5.16 allows Unicode package names. While we can't have Unicode package names map to .pm files, an ASCII .pm file could contain Unicode "package NAME" statements.

Should Unicode package names be detected and indexed?

Mar 06 '13 11:03 dagolden

we can't have Unicode package names map to .pm files

I'll be "that guy" to ask the dumb question: why not?

Mar 06 '13 19:03 karenetheridge

How do you know the filesystem supports Unicode? If so, what encoding should be used? What library functions do you call? Does Perl have any clue how to do these things properly and portably?

Mar 06 '13 20:03 dagolden

How do you know the filesystem supports Unicode? If so, what encoding should be used?

Unix is flawed on that matter. But other operating system such as Windows have clearly defined encoding of file names and paths.

Mar 06 '13 21:03 dolmen

On Wed, Mar 06, 2013 at 01:21:56PM -0800, Olivier Mengué wrote:

How do you know the filesystem supports Unicode? If so, what encoding should be used?

Unix is flawed on that matter. But other operating system such as Windows have clearly defined encoding of file names and paths.

The problem is that it's a property of the filesystem, not the operating system. Even if NTFS and HFS+ both do specify an encoding and normalization form for their files, that doesn't help when you mount a network drive (or ext3 data drive, or something) on Windows or OSX.

-doy

Mar 06 '13 21:03 doy

@karenetheridge it is a theoretically solvable problem, but not one that Perl yet solves. It would be nice if it did, though, but that's a topic for perlbug. :-)

Mar 06 '13 23:03 dagolden

@doy Windows has Unicode file APIs. That's what is available to applications (it also has 8bits APIs for legacy applications) and that's also how it works in the kernel. Translating Unicode to what the filesystem can understand is the task of the filesystem driver. So, I repeat: Unix is flawed. Don't expect all operating systems to have the same flaws.

Mar 08 '13 13:03 dolmen

On Fri, Mar 08, 2013 at 05:31:08AM -0800, Olivier Mengué wrote:

@doy Windows has Unicode file APIs. That's what is available to applications (it also has 8bits APIs for legacy applications) and that's also how it works in the kernel. Translating Unicode to what the filesystem can understand is the task of the filesystem driver.

So, I repeat: Unix is flawed. Don't expect all operating systems to have the same flaws.

My point is that it's Unix filesystems that are flawed, not (just) Unix itself. If you mount a Unix filesystem on a Windows computer, there's no way for Windows to get it right in all cases because there is no standard for encoding Unicode paths.

-doy

Mar 08 '13 15:03 doy

More thoughts on filesystems and encodings -- I've just shipped a new version of Acme-LookOfDisapproval (0.006) that includes the Acme::ಠ_ಠ module, which is installed dynamically via Makefile.PL via lib/Acme/o_o.pm, to get around potential differences between the author's and user's filesystems. I'm interested to see what the smokers do with this module -- e.g. if 'use Acme::ಠ_ಠ' works, by finding a Acme/ಠ_ಠ.pm file in PERL5LIB.

My hope is that PAUSE will eventually be able to index this file, at least if given an explicit piece of 'provides' metadata, if not by scanning the distribution tarball.

Nov 29 '13 18:11 karenetheridge

I think we should leave the status quo as it is. Perl can't cope with this situation well, so I don't think PAUSE should do anything except maybe ban non-ASCII packages for now.

Apr 28 '23 11:04 rjbs

pause pause copied to clipboard

What to do about Unicode package names

pause
pause copied to clipboard