pause icon indicating copy to clipboard operation
pause copied to clipboard

Switch to Module::Metadata for parsing packages out of dists.

Open dagolden opened this issue 12 years ago • 4 comments

it would be great if we could have common code used by PAUSE and by the module builders (EU::MM, M::B, M::I, D:;Z, etc.)

Would need a lot of testing to be confident in the switch.

dagolden avatar Mar 02 '13 16:03 dagolden

FWIW, MetaCPAN and Pinto both use Module::Metadata to parse packages out of dists. So looking at the results from either of those should give you a sense of how close they might come to what PAUSE does (or did).

For example, I have a Pinto repository that contains all of CPAN. So in theory, you should be able to compare it's index with an index from CPAN and make some kind of estimation about accuracy.

thaljef avatar Apr 18 '13 05:04 thaljef

I think Matthew Horsfall was trying to write a canonical "index this tarball like PAUSE would" library at QAH2013, but I'm not sure if he ever finished. That would be even more useful than just Module::Metadata.

dagolden avatar Apr 18 '13 09:04 dagolden

+1_000_000 for a canonical "index this distribution" library -- like say Dist::Metadata?

karenetheridge avatar Nov 29 '13 18:11 karenetheridge

The problem isn't parsing META or .pm files. Dist::Metadata does that pretty well (via Module::Metadata). As I understand things, the difficult problem is that PAUSE relies on past indexing results to determine the indexable packages in any given distribution.

PAUSE has a feature called a "simile" which basically ignores a package if it doesn't match the filename and that package was previously seen in a filename that did match. I believe this is intended to allow you to monkey patch other packages without violating the namespace permissions.

For example, if I upload a distribution that contains package Carp; inside a file called Foo.pm then PAUSE will ignore my Carp package because it has already seen the Carp package inside a file called Carp.pm. But if I had named the file Carp.pm then PAUSE would consider the distribution "unauthorized" and refuse to add its contents to the index.

This means that it is impossible to properly index any distribution in isolation. You must know about every other distribution before deciding what packages can be indexed. So, only PAUSE can index like PAUSE. I've been able to come pretty close, by replaying every distribution that has ever been uploaded. But that's not really practical for most situations.

thaljef avatar Nov 29 '13 22:11 thaljef