pause
pause copied to clipboard
From one feed of uploads to three ...
When we did the PSIXDISTS trial uploads, a lot of things that follow the PAUSE upload feed to spot new perl5 distributions became somewhat confused (both software and liveware).
I think for the moment, the sensible approach to deal with this is to have three feeds
- The current feed, modified to exclude ^Perl6/
- A new 6uploads feed, which contains only ^Perl6/ uploads
- an alluploads feed, which contains both.
Naming I care not about, just about being able to e.g. point a GumbyNET at a 6uploads feed for #perl6, and being able to not confuse people expecting only perl5 stuff.
I suspect we're going to find that this isn't exactly a PAUSE issue in its entirety, but I still can't think of a better place to put the ticket.
Brain dumping -
9:37 @kentnl https://metacpan.org/feed/recent http://search.cpan.org/recent
http://search.cpan.org/uploads.rdf # related?
Where does GumbyNET get its data from?
If we have multiple consumers of the pause indexing emails or the daemon tail, can we play with things so that people's regexps do what we want them to?
20:07 < ranguard> mst: https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Script/Watcher.pm#L269 - It watches the RECENT-*.json files for changes to the CPAN directory every 15 seconds
Well, that's one target. I guess 'RECENT-1h', 'RECENT6-1h' and 'RECENTALL-1h' ?
On Sun, 10 Jan 2016 12:10:14 -0800, shadowcat-mst [email protected] said:
20:07 < ranguard> mst: https://github.com/CPAN-API/cpan-api/blob/master/lib/MetaCPAN/Script/Watcher.pm#L269 - It watches the RECENT-*.json files for changes to the CPAN directory every 15 seconds
Well, that's one target. I guess 'RECENT-1h', 'RECENT6-1h' and 'RECENTALL-1h' ?
Now I understand what you meant with three feeds. So I said "wfm" too early.
I don't think I can be persuaded to change File::Rsync::Mirror::Recent. That'a the system that owns the RECENT* files. I'd strongly prefer to have these uchanged. Any split of these files can happen downstream, no?
andreas
Would one option be to keep the RECENT* (all) but add:
- p5-RECENT...
- p6-RECENT...
This disadvantage of this is most downstream clients (assuming most currently just want p5) then have to be updated, but at least it should be as simple as looking at a different file path, rather than having to parse the content differently.
On Tue, 12 Jan 2016 12:50:20 -0800, Leo Lapworth [email protected] said:
Would one option be to keep the RECENT* (all) but add:
- p5-RECENT...
- p6-RECENT...
This disadvantage of this is most downstream clients (assuming most currently just want p5) then have to be updated, but at least it should be as simple as looking at a different file path, rather than having to parse the content differently.
That's a piece of software that can run anywhere besides pause. But the way you describe it, it bears a price for everybody of additional files in the system that eat resources in form of space, download time and complexity. So I'd say such a splitter should run elsewhere. I'm open for better suggestions.
One option might be to teach File::Rsync::Mirror::Recent to do filtering or splitting. And/or to read the indexes from different places than the one that offers the files for download.
andreas
I also hadn't realized that the suggestion was to alter the rrr
files. I think that's pretty much a non-starter, unfortunately. It isn't crazy to suggest that PAUSE itself produce RSS files, and that would be easy… but it does mean that downstream things would have to be updated.
I hadn't realised when I first looked at this that the files used for rrr are the same ones people are using to find newly uploaded dists (as opposed to files in general). If it turns out only to be metacpan, that can surely be changed, but if it's other things as well, we're into nasty trade-off land.
I think we need to check how search.cpan.org, the various utility bots, cpanmetadb, and cpantesters handle this. I shall start poking people.
(Edited to add: IRC pings left for preaction wrt cpantesters, BinGOs wrt GumbyNET*, and miyagawa wrt cpanmetadb; mail sent to the search.cpan.org contact address in the hopes Graham will take pity on me for that question ;)
The current version of cpanmetadb doesn't use rrr files. Instead it just fetches the whole 02packages file and replaces everything in one big transaction. I also clone & fetch the PAUSE-batch git repo to retrieve the history.
Previously I was looking at the RRR files when it was running on GAE in Python, but not anymore. https://github.com/miyagawa/cpanmetadb/blob/master/main.py#L135
@miyagawa Thanks! That rules cpanmetadb out of us needing to worry about it. Molto Bene.
So, the Gumbys are apparently NNTP + regex based -
17:13 <@mst> BinGOs: what are the Gumbys using? please either answer here and
I'll ticket it or answer on GH above
17:22 <@BinGOs> mst: it tails the nntp.perl.org newsgroup 'perl.cpan.uploads'
for upload emails.
17:22 <@mst> ooooo.
17:23 <@mst> how does it tell one's an upload?
17:24 <@BinGOs> $subject =~ m!^CPAN Upload: (.+\.tar\.gz|\.tgz|\.zip)$!i
so I guess we can eliminate this particular vector by having the Perl6/ directory generate a subject line of 'Perl6 Upload: ...' - would anybody see any particular issue with that? (notably @andk @rjbs)
It at least doesn't seem crazy. I don't think I have any strong opinion beyond that.
On Sat, 16 Jan 2016 09:27:18 -0800, shadowcat-mst [email protected] said:
So, the Gumbys are apparently NNTP + regex based - 17:13 @mst BinGOs: what are the Gumbys using? please either answer here and I'll ticket it or answer on GH above 17:22 @BinGOs mst: it tails the nntp.perl.org newsgroup 'perl.cpan.uploads' for upload emails. 17:22 @mst ooooo. 17:23 @mst how does it tell one's an upload? 17:24 @BinGOs $subject =~ m!^CPAN Upload: (.+.tar.gz|.tgz|.zip)$!i
so I guess we can eliminate this particular vector by having the Perl6/ directory generate a subject line of 'Perl6 Upload: ...' - would anybody see any particular issue with that? (notably @andk @rjbs)
I see no nasty effects of such a change. The subject line gets written here:
https://github.com/andk/pause/blob/master/bin/paused#L561
andreas