Metric Idea: Maintainer Activity
Metric basics
- Metric title: Maintainer Activity
- Metric summary (1-2 sentences): What is the activity of a maintainer across all of open source?
- Why should this metric be created? (1-2 sentences): For libraries that are "complete", there may not be activity on the project but the maintainer is active and would be responsive if the library needed attention. Thus, Maintainer Activity outside the focal project is a good health signal.
Data collection and measurement
Are there existing tools that could collect this data? If yes, list them:
- GitHub Archive can be a source for activity in GitHub-hosted projects.
- Bitergia offers the "footprint" analysis as a paid service, which can produce this metric.
If this metric involves a lot of raw data, what filters would you use to narrow down the metric? If applicable, describe ways to filter the data into smaller segments:
- Filter by the specific maintainer of a project.
- Group data by the repository they are active in.
How would you visualize this metric? If you have an idea on how this metric should be visualized or displayed so it makes the most sense to a viewer, describe that here:
- # of Commits a maintainer made during the last 6 months.
About you
- Are you interested in authoring this metric together with the Working Group?: yes
- Have you attended a CHAOSS Working Group meeting before?: yes
- If not, would you consider joining one to discuss your metric idea?: yes -- because the WG meetings ended, I'll join the community call
- Anything else you would like us to know?: The metric idea came from Jordan Harband.
In other words, the goal of measuring activity on a project is to determine the likelihood that a vuln will be fixed, and project activity simply doesn’t tell you this with any accuracy whatsoever - but maintainer activity can.
Is the mantainer the right scope? Maybe org activity scope may bemore reasonable. I can easily see how those working in companies doing open source may change companies or leave and stay active in other organizations and still never going to look back into that bug you are reporting in the "abandon" repository.
The vast majority of open source is not under the control of a company, but certainly for those projects that are, you could consider the activity of the overall company.
The most significant question I have -- and I do think it is a significant question -- is: "what is a maintainer"? Is it the platform designation for permission level on a repository? If so, its cut and dried. More likely it includes that AND some other folk construction of "maintainer". Therein lies the challenge.
That depends on the ecosystem, and how publishes are done.
In npm, it'd be "whoever can publish" + "whoever has write access to a repo that has a publish-from-ci workflow". In go, it'd be "whoever has write access to the repo". … etc.
In other words, just like what's actually being sought here is "will a vulnerability be fixed?", the people whose activity you want to track are "whoever is capable of releasing a fix for the vulnerability"
I really like the idea of Maintainer Activity as a metric. It would be my opinion that there are two paths to follow, and each could be adjusted to best fit the purpose for analysis.
Option A: If looking at a specific Project or set of projects, Maintainer Activity should be comprised on not only # of commits, but also # of issues closed, # of PR's, and # of comments on issues/PR's. Based upon the above suggestions, timeline for tracking these metrics for an individual will be inherently important. I do believe that a Dashboard could be created to compare, contrast, and potentially combine the above individual metrics, if that becomes difficult or breaks, a paid analysis solution may be required.
Option B: Option A answers the question surrounding a maintainer's commitment to a specific source, This leaves quite the gap for a developer looking to broaden their scope, or transition to any other project. Option B serves as a suggestion to create a new library, or Collection of libraries. These libraries would require a high-level of maintenance (at least at first) but could be funded through a subscription/membership model. In the proposed library, all of the notable development history for a maintainer could be compiled resulting in a 'card' style visualization, this 'card' could become a new resume/cv requirement, it could also be shared or attached similar to a badge. I am interested to see if a general library would seem to make more sense, or if libraries should be delineated based upon types of code?
Via either option, or continued suggestion, I am eager to participate in the development of a maintainer activity metric. Thanks!
"number of comments" wouldn't help you, though, because anyone can make a comment. The only signals that can be used - assuming you're trying to confirm that there's someone around to actually release a vuln fix - are things that maintainer privileges would require.
i am suggesting/referring to the ability to track an individual contributors comments, in this case by repo. There is some level of fact that could indicate the need for a manual review, however as an active maintainer, they are likely highly participative via comments if there are delays for new changes/PR's approved.
Discussion from the Community Call on June 10:
- How is the activity of a maintainer in other projects telling us about a focal project? How do we mitigate drawing a false conclusion about a project that is abandoned?
- Could we combine the activity across open source with the response time to past issues or pull requests on the focal project; signaling that the maintainer response when something does come up.
- How about activity in projects that depend on the focal project. If the maintainer is working in the downstream, they may be more likely to re-activate the upstream project when an issue arises and needs attention.
- A lot of activity in other projects may indicate that a contributor has no bandwidth to work on the focal project.
- What could be a better name? The current name may indicate that the activity is on the focal project - after someone is in that mindset, it is hard to change the perspective to what this metric is actually intending to do.
- Maintainer availability on an otherwise inactive project, judged by activity in other open source projects
- There needs to be a distinction between projects that appear inactive but are actively monitored and maintained.
- Could there be a signal that a maintainer is present and available? A note in the repository stating the current state of the project and a pointer to how to reach the maintainer?
- State Declaration: Active/Maintenance
- Does the maintainer respond to “Are you there?” messages?
These notes may make more sense after viewing the recording of the meeting, which will be made available on YouTube: https://www.youtube.com/chaosstube
One method for identifying who is a maintainer on a repository: ecosyste.ms has a logic for identifying who is likely a maintainer.
Source: https://youtu.be/KnqQzp9plFA?feature=shared&t=1225 (timestamps from: 20:25 to: 22:55)
Regarding the first question, you don't need to know about the focal project directly - all you need is to know if you can get in contact with someone who's empowered to fix it. Even if the project is implicitly abandoned, a maintainer who's active elsewhere can still show up, release the fix, and then explicitly deprecate the project.
For a better metric name, "Maintainer Reachability"?