metacpan-api
metacpan-api copied to clipboard
XML::RegExp Incorrectly Thinks it's In libxml-enno
The contents of the XML::RegExp module API contains:
"download_url" : "https://cpan.metacpan.org/authors/id/E/EN/ENNO/libxml-enno-1.02.tar.gz",
"author" : "ENNO",
"maturity" : "released",
"sloc" : 25,
"id" : "k9yJlRll1ivaM6LiuI5PAcgicFc",
"distribution" : "libxml-enno",
"directory" : false,
However, the copy of XML::RegExp in libxml-enno-1.02 (released in 2000) has no version. Meanwhile, the copy of XML::RegExp in the XML-RegExp distribution is 0.04 (released in 2012). So I think that should be the canonical distribution for XML::RegExp.
This also helps to explain why my build service started choking on libxml-enno recently, as reported in #591. Probably the libxml-enno distribution should be removed from CPAN.
02packages on PAUSE supports this:
XML::RegExp 0.04 T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz
Note that the download URL endpoint does it right:
$ curl -fsS https://fastapi.metacpan.org/v1/download_url/XML::RegExp
{
"status" : "latest",
"date" : "2012-03-26T17:32:44",
"version" : "0.04",
"download_url" : "https://cpan.metacpan.org/authors/id/T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz"
}
Any chance of an update on this? I have a build system that fails as long as this issues exists. Maybe I'll ask that libxml-enno be removed from CPAN, but it does seem like there might be an issue with the indexer here, too.
Looks like it's fixed now.
olaf@lw-mc-03:~$ sudo sh metacpan-sysadmin/bin/api/reindex.sh https://cpan.metacpan.org/authors/id/T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz
2017/01/06 03:22:28 I release: Downloading https://cpan.metacpan.org/authors/id/T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz
2017/01/06 03:22:32 I release: Processing /home/metacpan/metacpan-api/var/tmp/http/authors/id/T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz
2017/01/06 03:22:44 I release: XML-RegExp consists of 1 modules
2017/01/06 03:22:44 I release: Upgrading release XML-RegExp-0.04
2017/01/06 03:22:44 I release: Downgrading release libxml-enno-1.02
I wonder if there's an issue about the order in which these modules got indexed. If so, we might see an issue after another full re-index.
Sadly, the module API still has it in libxml-enno, which is wrong.
@mickeyn thoughts?
The issue is probably with the module in libxml-enno being authorized.
I tried reprocessing it, but it's (it = XML::RegExp in libxml-enno) that still marked as authorized.
The reason is this entry in 06perms:
XML::RegExp,ENNO,c
XML::RegExp,TJMATHER,f
02packages only marks XML-RegExp as the distribution for this module name space, but I don't think we cross that info for permissions.
What does it mean to be authorized?
It means the author is allowed to release modules with this namespace (XML::RegExp).
the (PAUSE) permissions for it are listed in the 06perms file (CPAN).
Since this is an authorized, indexed and 'latest' (version matches 02packages) distribution it causes the query not to be able to distinguish between the two - it just (consistently) picks the first one that passes all checks and in this case is wrong.
The 'version' argument may be valid in this case, but we don't check for it. I don't think it's guaranteed that a competing name will have a zero or lower version as it comes from a different distribution that may have a completely different versioning system (it could have easily been a larger number than yours and still be wrong).
I don't see any other releases by Enno, so I suggest maybe revoke his co-maint on this module (you can do it in PAUSE) and then we can reprocess his distribution, making his XML::RegExp unauthorized and therefore it shouldn't appear in this query.
I can't do it; it's not my module.
Also, what would prevent this from happening again with some other module? This can't be the only one with this issue.
How does search.cpan.org decide which to think is the latest?
@theory this may be of interest https://github.com/metacpan/metacpan-api/blob/master/docs/indexing.md (if not direct use in this instance).
We don't know what search.cpan.org does, it's closed source code
Removing ENNO as co-maint might a PITA, because it's not just XML::RegExp that's a problem. See all these modules which have since been moved to other distributions. Here's an example: MetaCPAN thinks the latest download file for XML::DOM is also libxml-enno:
{
"documentation" : "XML::DOM",
"download_url" : "https://cpan.metacpan.org/authors/id/E/EN/ENNO/libxml-enno-1.02.tar.gz",
So I can see two ways to address this:
- Figure out what the proper relationship with
06permsshould be. This is the preferred approach, since I don't expect that this issue will be limited to the modules in libxml-enno. - Remove libxml-enno from CPAN. This would be a reasonable short-term solution to at least get things working properly again.
"MetaCPAN thinks"... MetaCPAN doesn't think, it has a deterministic logic :)
to your suggestions:
as I posted above the XML::RegExp entries in 06perms allow (PAUSE) both ENNO and TJMATHER to use this name of a released module so this is why the two distributions are 'authorized'.
Consider a bigger problem, if both distributions were released by the same author (then 06perms won't help at all).
What can maybe done here is also looking at 02packages which will indicate the PAUSE 'latest' module:
XML::RegExp 0.04 T/TJ/TJMATHER/XML-RegExp-0.04.tar.gz
By itself I don't think it's enough, as I suspect that a scenario in which a newer release (in this case of libxml-enno) by an authorized author will take over that record (can someone approve/disprove that?)
I agree that if libxml-enno has no place in CPAN other than causing problems, it should be removed.
If that is not possible, removing the author's permissions is an alternative as it invalidates their releases of modules they have no permissions for (requires a reprocessing by MetaCPAN - we can do that once perms are changed)
By having the 'f' permissions to this module name, TJMATHER can remove the co-maint permissions on ENNO (or I guess PAUSE admins if necessary).
Who here can make those changes to CPAN described in @mickeyn's #2? @oalders?
@theory have you been in touch with either of the authors? maybe try https://metacpan.org/author/TJMATHER
I work at TJ's company. I can mention this to him today.
Sent from my iPhone
On Jan 10, 2017, at 08:38, Mickey [email protected] wrote:
@theory https://github.com/theory have you been in touch with either of the authors? maybe try https://metacpan.org/author/TJMATHER
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/metacpan/metacpan-api/issues/592#issuecomment-271577373, or mute the thread https://github.com/notifications/unsubscribe-auth/AAF3zUbX3wDAunfUMx6ccq1WX-wYbUO6ks5rQ4nPgaJpZM4LAjRH .
ENNO's perms have now been removed.
reprocessed and indeed it's now unauthorized. the above links now seem to point to the right place.
@theory can you please confirm?
Yay, my RPM builder finally went back to normal!
I went through the libxml-enno module list and found there are still two where MetaCPAN incorrectly points to libxml-enno:
Looks like both are also now owned by T.J.. So maybe he should just do a quick audit of all of his modules and remove enno from them all.
Anyway, while this addresses my immediate issue, I'm not sure I wouldn't still call this a bug in MetaCPAN. But I admit to not knowing enough about MetaCPAN to say how difficult it would be to address.
@theory regardless of MetaCPAN's logic, if you follow my previous explanation you can see how the way PAUSE permissions work would make it impossible to determine which release should be pointed to in such a case (e.g. what if ENNO released a new version of libxml-enno today? it would be newer, would you expect it to show up for the module?).
We can modify MetaCPAN's logic of course, but the question is - to what? not seeing the release you wish to see doesn't mean MetaCPAN gets it wrong. This is more of a race problem, but it's caused by an external source that MetaCPAN can't control, but only reads and reflects.
For any given module, the API should always return the latest distribution release containing the highest version of that module (with unversioned modules defaulting to v0), from an approved maintainer of that module. If ENNO released a new version of libxml-enno with those modules, and they were still an official maintainer (or co-maintainer), and the module versions were higher than or equal to the versions in other distributions, then I would expect that that new libxml-enno release to appear.
In short:
- Assume a module with no version is v0.
- Look for the highest stable version of the module.
- Look for the latest release from a sanctioned maintainer (at the time of release) that contains that version.
@theory, to your suggestion points:
- not sure this is an indicator we can use - we'll have to collect some data to see how relevant it is.
- that does not work between unrelated releases of different distributions, the version number itself is not a very good indicator.
- we don't (currently) have the history of permissions/latest at the time of release, we can only work with current permissions (06perms) / lastest (02packages)
What I think we can / should do is use 02packages info to identify the distribution that currently "own" the module namespace. @oalders: does this sound reasonable, or do you think there is another way?
@mickeyn I think we really should defer to 02packages wherever possible.
then it might help to index it... something to think about
Is there something we can do in the short term to fix the invalid value reported in #614?
FWIW, the v0 API does not have this bug:
- v0 XML::RegExp correctly has
"distribution" : "XML-RegExp", - v0 ExtUtils::Depends correctly has
"distribution" : "ExtUtils-Depends",(see #614)
So I'll just switch back to it until this bug can get fixed. :-(
Bah, I see that the v0 API is not up-to-date. 😖
@theory thanks, we'll see about what the v0 problem is.