pause icon indicating copy to clipboard operation
pause copied to clipboard

What to do about Unicode package names

Open dagolden opened this issue 11 years ago • 8 comments

Perl 5.16 allows Unicode package names. While we can't have Unicode package names map to .pm files, an ASCII .pm file could contain Unicode "package NAME" statements.

Should Unicode package names be detected and indexed?

dagolden avatar Mar 06 '13 11:03 dagolden

we can't have Unicode package names map to .pm files

I'll be "that guy" to ask the dumb question: why not?

karenetheridge avatar Mar 06 '13 19:03 karenetheridge

How do you know the filesystem supports Unicode? If so, what encoding should be used? What library functions do you call? Does Perl have any clue how to do these things properly and portably?

dagolden avatar Mar 06 '13 20:03 dagolden

How do you know the filesystem supports Unicode? If so, what encoding should be used?

Unix is flawed on that matter. But other operating system such as Windows have clearly defined encoding of file names and paths.

dolmen avatar Mar 06 '13 21:03 dolmen

On Wed, Mar 06, 2013 at 01:21:56PM -0800, Olivier Mengué wrote:

How do you know the filesystem supports Unicode? If so, what encoding should be used?

Unix is flawed on that matter. But other operating system such as Windows have clearly defined encoding of file names and paths.

The problem is that it's a property of the filesystem, not the operating system. Even if NTFS and HFS+ both do specify an encoding and normalization form for their files, that doesn't help when you mount a network drive (or ext3 data drive, or something) on Windows or OSX.

-doy

doy avatar Mar 06 '13 21:03 doy

@karenetheridge it is a theoretically solvable problem, but not one that Perl yet solves. It would be nice if it did, though, but that's a topic for perlbug. :-)

dagolden avatar Mar 06 '13 23:03 dagolden

@doy Windows has Unicode file APIs. That's what is available to applications (it also has 8bits APIs for legacy applications) and that's also how it works in the kernel. Translating Unicode to what the filesystem can understand is the task of the filesystem driver. So, I repeat: Unix is flawed. Don't expect all operating systems to have the same flaws.

dolmen avatar Mar 08 '13 13:03 dolmen

On Fri, Mar 08, 2013 at 05:31:08AM -0800, Olivier Mengué wrote:

@doy Windows has Unicode file APIs. That's what is available to applications (it also has 8bits APIs for legacy applications) and that's also how it works in the kernel. Translating Unicode to what the filesystem can understand is the task of the filesystem driver.

So, I repeat: Unix is flawed. Don't expect all operating systems to have the same flaws.

My point is that it's Unix filesystems that are flawed, not (just) Unix itself. If you mount a Unix filesystem on a Windows computer, there's no way for Windows to get it right in all cases because there is no standard for encoding Unicode paths.

-doy

doy avatar Mar 08 '13 15:03 doy

More thoughts on filesystems and encodings -- I've just shipped a new version of Acme-LookOfDisapproval (0.006) that includes the Acme::ಠ_ಠ module, which is installed dynamically via Makefile.PL via lib/Acme/o_o.pm, to get around potential differences between the author's and user's filesystems. I'm interested to see what the smokers do with this module -- e.g. if 'use Acme::ಠ_ಠ' works, by finding a Acme/ಠ_ಠ.pm file in PERL5LIB.

My hope is that PAUSE will eventually be able to index this file, at least if given an explicit piece of 'provides' metadata, if not by scanning the distribution tarball.

karenetheridge avatar Nov 29 '13 18:11 karenetheridge

I think we should leave the status quo as it is. Perl can't cope with this situation well, so I don't think PAUSE should do anything except maybe ban non-ASCII packages for now.

rjbs avatar Apr 28 '23 11:04 rjbs