warehouse
warehouse copied to clipboard
Idea: Replace "Files" with "Publishers" on front page
Currently the front page stats summary has 3 software focused stats (projects, releases, files) and 2 people focused stats (projects, users). (I consider projects to be collections of people interested in a particular piece of software, hence the appearance on both lists)
What do you think about changing the front page statistics to be: Projects, Releases, Publishers, Users?
Where "Publishers" would be a count of the distinct accounts with admin access to one or more PyPI projects.
My goal with such a change would be to highlight a couple of things:
- that PyPI is, first and foremost, a software publishing platform, and publication doesn't happen without publishers
- getting a rough sense of the project/publisher ratio (i.e. if we have more projects than publishers, that means the typical bus factor is necessarily less than 1, with each publisher maintaining multiple projects on average)
It would also be interesting to get a sense of the publisher/user ratio, but we know that will be way off, since there isn't much reason to register if you're not a software publisher yourself.
It would also be interesting to get a sense of the publisher/user ratio, but we know that will be way off, since there isn't much reason to register if you're not a software publisher yourself.
I've had a similar assumption, that nobody would register for PyPI unless they were publishing a project, but just for kicks I just checked, we have a total of 159,246 users but only 34,832 user's show up in our roles table (which is where we store the mapping of Project -> User for permissions). This means that we're averaging 2.5 projects per "contributing" user.
As it stands, if we add "Publishers", I'm not sure that it makes much sense to keep "Users", but my immediate reaction is that "Publishers" probably fits the spirit of those statistics we show better than Users does anyway. The only reason it's really "Users" at all is because it was the most obvious thing at the time (the other 3 are just COUNT(*) queries on their tables, but "Publishers" is a COUNT(DISTINCT user).
My rationale behind keeping both would be "Every PyPI user is a potential future publisher". We don't really know what a healthy ratio looks like, but it's a safe bet that there is a healthy range, even if we don't currently know what it is.
@ewdurbin and I chatted about this briefly today, and concluded:
- Files is a useful metric, let's keep that
- Users is the one that likely needs some change, as pointed out by @dstufft
If someone wanted to pick this up, here's some details to be aware of:
- the stats are driven from database triggers that keep the stats (RowCount) up to date
- we now have multiple
roletables - Role, TeamRole, OrganizationRole - so those need to be taken into consideration - we have owner, maintainer Roles, as well as Org manager, member, Team member - all of these should be counted as publishers
So if someone want ed to pick this up, hopefully there's enough context to get started.