census icon indicating copy to clipboard operation
census copied to clipboard

Look at Jesus M. Gonzalez-Barahona (Bitergia) information, e.g., Polarsys Maturity Model, GrimoireLab

Open david-a-wheeler opened this issue 8 years ago • 3 comments

Jesus M. Gonzalez-Barahona (Bitergia) has done a lot of work on measuring OSS projects; we should look further at his (their) work. We had an interesting conversation at the 2016 Linux Foundation Collab Summit.

They've participated in some developments by the Eclipse Polarsys WG, which has a focus on maturity and future availability. The most interesting is probably the Polarsys Maturity Model, which is based in part in data they collect with MetricsGrimoire (it uses other sources too, such as Sonar): http://dashboard.polarsys.org/

We can see the definitions of the metrics in: http://dashboard.polarsys.org/documentation/metrics.html and the GQM model in: http://dashboard.polarsys.org/documentation/quality_model.html (Jesus notes that it takes a while to load).

Example: http://projects.bitergia.com/opnfv

Information on MetricsGrimoire is here: https://metricsgrimoire.github.io/

They are rewriting a whole new metrics collection system called GrimoireLab (a complete redesign based on their experience with MetricsGrimoire). As I understand it, the software itself is OSS (in Python3), they then sell services on top of it. GrimoreLab's architecture includes:

  • Perseval - Grabs data from VCS. One backend per software repo, in Python3, produces “data items”. Supports GitHub, SourceForge, etc. (Many!). It's not hard to write them.
  • Arthur: Orchestrates retrieval, Python3.
  • Kibiter – fork of ElasticSearch dashboard. (trying to upstrream)

More about GrimoireLab is at: http://grimoirelab.github.io/

A related book "Evaluating Free / Open Source Software Projects (Book)" is here: https://github.com/jgbarah/evaluating-foss-projects

On a related note, lots of GitHub-related information is available via: http://ghtorrent.org/

david-a-wheeler avatar Apr 05 '16 15:04 david-a-wheeler

Continuing: his new library “GrimoireLab” (http://grimoirelab.github.io/) is a complete redesign of his previous work and seems like a very promising step forward. It’s divided into 3 parts:

  • Perseval - Grabs data from VCS. One backend per software repo, in Python3, produces “data items”. Supports GitHub, SourceForge, etc. (Many!). License: GPLv3.
  • Arthur: Orchestrates retrieval, Python3. License: GPLv3.
  • Kibiter – fork of ElasticSearch dashboard. (he’s trying to upstream those changes). License: Apache 2.0.

The current software is MIT-licensed and thus compatible with GPLv3. It’s known that Apache 2.0 is compatible with GPLv3. So combing them should trigger no legal issues.

david-a-wheeler avatar May 30 '16 00:05 david-a-wheeler

The nice thing about GrimoireLab is that it's OSS and works to provide a general solution for accessing data about OSS projects.

The perceval list is instructive:

    bugzilla         Fetch bugs from a Bugzilla server
    gerrit           Fetch reviews from a Gerrit server
    git              Fetch commits from a Git log file
    github           Fetch issues from GitHub
    jenkins          Fetch builds from a Jenkins server
    jira             Fetch issues from JIRA issue tracker
    mbox             Fetch messages from MBox files
    pipermail        Fetch messages from a Pipermail archiver
    stackexchange    Fetch questions from StackExchange sites

The perceval code for backends is "perceval/perceval/backends/". There are several, e.g., "gitgub" acquires issues from GitHub, "git" fetches the commits from a Git repository. There aren't many however. In particular, a number of the "most risky" programs predate GitHub (and SourceForge) and don't necessarily use the same tools. If we are going to analyze the set of OSS we originally identified, at least in part, then we'll need to add more back-ends and such to support accessing data from other sources. E.g., we might need to add ways to acquire issues from gitlab / SourceForge / Savannah, acquire commits from CM systems like mercurial and subversion, and so on.

Adding the back-ends might not be too bad, though. The infrastructure is there, and it's clearly intended to support more back-ends. Pretty much all of these alternatives have some documented API. If we just focus on the "most important for our purposes" ones, and build on the existing code, it might be good for everyone.

david-a-wheeler avatar Jun 02 '16 15:06 david-a-wheeler

The older tool MetricsGrimoire https://metricsgrimoire.github.io/ on initial look appear to support more back-ends. Which might better meet our needs... but I hate to hitch our wagon to a dying horse. If Bitgeria is interested in moving to GrimoireLab, they might be very interested in transitioning those capabilities to their newer toolsuite.

david-a-wheeler avatar Jun 02 '16 15:06 david-a-wheeler