www.pm.org icon indicating copy to clipboard operation
www.pm.org copied to clipboard

History grokker

Open jhannah opened this issue 11 years ago • 13 comments

So for years I've been threatening to write a history grokker that reads our XML file and produces an RSS-style history log and a graph of # of active groups over time, etc. @rGeoffrey is presenting a "State of the Onion" at OSCON this Thursday. Maybe I aught to get around to it NOW...

grep 'status' on the current file is trivial. The trick is writing a program to understand the context of each of the 311 changes to that file since 2006. git log --patch --reverse perl_mongers.xml is a conceptual starting point. But I suspect we'll need some pretty heavy lifting before the context of many of those changes can be understood in human-readable terms suitable for an RSS feed. Something like:

  • 3c25cdc53 - 2006-04-30 - ignore
  • c8d7d2ca5 - 2006-05-24 - Helsingborg.pm new group leader: Stefan Midjich
  • 98ae765b3 - 2006-05-26 - New group: Kaiserslautern.pm
  • ... 308 more :)

A graph of simple stats is easier. For each of those commits, just pull out the XML at that point in time and do a grep status | uniq -c...

irc.perl.org #mongers is the IRC channel for discussion. :+1:

jhannah avatar Jul 22 '13 19:07 jhannah

Sounds like fun... Do we have a sample input and sample output?

evaddnomaid avatar Jul 22 '13 19:07 evaddnomaid

One could make it with XML::MyXML (to publicize my module a bit)

akarelas avatar Jul 22 '13 19:07 akarelas

I was just starting a XML::Rabbit dist for the XML file.

djgoku avatar Jul 22 '13 19:07 djgoku

@evaddnomaid https://github.com/perlorg/www.pm.org/blob/master/perl_mongers.xml

djgoku avatar Jul 22 '13 19:07 djgoku

@djgoku thanks, so perl_mongers.xml (and its history) is our input? And we want a single RSS file as output, showing the changes in number of active groups over time?

Should we get a discussion going, on IRC maybe?

evaddnomaid avatar Jul 22 '13 20:07 evaddnomaid

$ perl -Ilib t/01-pm.t 
ok 1 - The object isa PM::Grokker
1..1

djgoku avatar Jul 22 '13 20:07 djgoku

wow there are 717 groups! Fun stat.

djgoku avatar Jul 22 '13 20:07 djgoku

Total Groups 717:
Group statuses and counts below:

Status: Vetoed by Robert. See RT 57812. Count: 1    
Status: active               Count: 254  
Status: dead                 Count: 25   
Status: disabled             Count: 1    
Status: disbanded to make room for other groups -jhannah 20061203 Count: 1    
Status: gone                 Count: 17   
Status: inactive             Count: 172  
Status: leb                  Count: 13   
Status: mlb                  Count: 40   
Status: on hold              Count: 1    
Status: sleeping             Count: 35   
Status: spam                 Count: 1    
Status: undef                Count: 152  
Status: unknown              Count: 4

djgoku avatar Jul 22 '13 20:07 djgoku

I've always been fond of XML::Twig for plucking bits from XML. step 0. Omit all but latest version per date. step 1. Culling just Group Name and Status into a proxy file for each version. step 2. Diff the proxy files to generate status change events for each date. Delete dates with no change in group name/status. step 3. Statistics for each date, create time series.

n1vux avatar Jul 23 '13 13:07 n1vux

Is it possible to regenerate the numbers that @djgoku ran back in 2013?

oalders avatar Jul 08 '23 21:07 oalders

Is it possible to regenerate the numbers that @djgoku ran back in 2013?

I don’t remember writing this, but has what you want I think. lol

https://github.com/djgoku/PM-Grokker/blob/master/bin/grokker.pl

djgoku avatar Jul 08 '23 23:07 djgoku

That's it! Thanks, @djgoku. 😄

oalders avatar Jul 09 '23 00:07 oalders

Status: Vetoed by Robert. See RT 57812. Count: 1
Status: active               Count: 210
Status: dead                 Count: 24
Status: disabled             Count: 1
Status: disbanded to make room for other groups -jhannah 20061203 Count: 1
Status: gone                 Count: 16
Status: inactive             Count: 237
Status: leb                  Count: 13
Status: mlb                  Count: 40
Status: on hold              Count: 1
Status: sleeping             Count: 34
Status: spam                 Count: 1
Status: undef                Count: 148
Status: unknown              Count: 4

So, the diff would be:

git diff --no-index before.txt after.txt
diff --git a/before.txt b/after.txt
index c38a104..719d684 100644
--- a/before.txt
+++ b/after.txt
@@ -1,14 +1,14 @@
-Status: active               Count: 254
-Status: dead                 Count: 25
+Status: active               Count: 210
+Status: dead                 Count: 24
 Status: disabled             Count: 1
 Status: disbanded to make room for other groups -jhannah 20061203 Count: 1
-Status: gone                 Count: 17
-Status: inactive             Count: 172
+Status: gone                 Count: 16
+Status: inactive             Count: 237
 Status: leb                  Count: 13
 Status: mlb                  Count: 40
 Status: on hold              Count: 1
-Status: sleeping             Count: 35
+Status: sleeping             Count: 34
 Status: spam                 Count: 1
-Status: undef                Count: 152
+Status: undef                Count: 148
 Status: unknown              Count: 4
 Status: Vetoed by Robert. See RT 57812. Count: 1

oalders avatar Jul 09 '23 00:07 oalders