www.pm.org
www.pm.org copied to clipboard
History grokker
So for years I've been threatening to write a history grokker that reads our XML file and produces an RSS-style history log and a graph of # of active groups over time, etc. @rGeoffrey is presenting a "State of the Onion" at OSCON this Thursday. Maybe I aught to get around to it NOW...
grep 'status'
on the current file is trivial. The trick is writing a program to understand the context of each of the 311 changes to that file since 2006. git log --patch --reverse perl_mongers.xml
is a conceptual starting point. But I suspect we'll need some pretty heavy lifting before the context of many of those changes can be understood in human-readable terms suitable for an RSS feed. Something like:
- 3c25cdc53 - 2006-04-30 - ignore
- c8d7d2ca5 - 2006-05-24 - Helsingborg.pm new group leader: Stefan Midjich
- 98ae765b3 - 2006-05-26 - New group: Kaiserslautern.pm
- ... 308 more :)
A graph of simple stats is easier. For each of those commits, just pull out the XML at that point in time and do a grep status | uniq -c
...
irc.perl.org #mongers is the IRC channel for discussion. :+1:
Sounds like fun... Do we have a sample input and sample output?
One could make it with XML::MyXML (to publicize my module a bit)
I was just starting a XML::Rabbit dist for the XML file.
@evaddnomaid https://github.com/perlorg/www.pm.org/blob/master/perl_mongers.xml
@djgoku thanks, so perl_mongers.xml (and its history) is our input? And we want a single RSS file as output, showing the changes in number of active groups over time?
Should we get a discussion going, on IRC maybe?
$ perl -Ilib t/01-pm.t
ok 1 - The object isa PM::Grokker
1..1
wow there are 717 groups! Fun stat.
Total Groups 717:
Group statuses and counts below:
Status: Vetoed by Robert. See RT 57812. Count: 1
Status: active Count: 254
Status: dead Count: 25
Status: disabled Count: 1
Status: disbanded to make room for other groups -jhannah 20061203 Count: 1
Status: gone Count: 17
Status: inactive Count: 172
Status: leb Count: 13
Status: mlb Count: 40
Status: on hold Count: 1
Status: sleeping Count: 35
Status: spam Count: 1
Status: undef Count: 152
Status: unknown Count: 4
I've always been fond of XML::Twig for plucking bits from XML. step 0. Omit all but latest version per date. step 1. Culling just Group Name and Status into a proxy file for each version. step 2. Diff the proxy files to generate status change events for each date. Delete dates with no change in group name/status. step 3. Statistics for each date, create time series.
Is it possible to regenerate the numbers that @djgoku ran back in 2013?
Is it possible to regenerate the numbers that @djgoku ran back in 2013?
I don’t remember writing this, but has what you want I think. lol
https://github.com/djgoku/PM-Grokker/blob/master/bin/grokker.pl
That's it! Thanks, @djgoku. 😄
Status: Vetoed by Robert. See RT 57812. Count: 1
Status: active Count: 210
Status: dead Count: 24
Status: disabled Count: 1
Status: disbanded to make room for other groups -jhannah 20061203 Count: 1
Status: gone Count: 16
Status: inactive Count: 237
Status: leb Count: 13
Status: mlb Count: 40
Status: on hold Count: 1
Status: sleeping Count: 34
Status: spam Count: 1
Status: undef Count: 148
Status: unknown Count: 4
So, the diff would be:
git diff --no-index before.txt after.txt
diff --git a/before.txt b/after.txt
index c38a104..719d684 100644
--- a/before.txt
+++ b/after.txt
@@ -1,14 +1,14 @@
-Status: active Count: 254
-Status: dead Count: 25
+Status: active Count: 210
+Status: dead Count: 24
Status: disabled Count: 1
Status: disbanded to make room for other groups -jhannah 20061203 Count: 1
-Status: gone Count: 17
-Status: inactive Count: 172
+Status: gone Count: 16
+Status: inactive Count: 237
Status: leb Count: 13
Status: mlb Count: 40
Status: on hold Count: 1
-Status: sleeping Count: 35
+Status: sleeping Count: 34
Status: spam Count: 1
-Status: undef Count: 152
+Status: undef Count: 148
Status: unknown Count: 4
Status: Vetoed by Robert. See RT 57812. Count: 1