pub-dev icon indicating copy to clipboard operation
pub-dev copied to clipboard

Improved metrics: Measure and report the download count

Open ShivamArora opened this issue 5 years ago • 43 comments

As a developer and owner of a package, I would like to see the download stats for my package.

The npm package repository on npmjs.com provides the download statistics for the libraries.

I would like to see the same happening on the pub.dev website.

This will help in getting to know how many people/projects are using my package.

Screenshot 2019-08-28 at 4 01 18 PM

ShivamArora avatar Aug 28 '19 10:08 ShivamArora

See also https://github.com/dart-lang/pub-dev/issues/1229.


Defining the download count is a bit non-trivial though. Should we remove bots (how), if not travis-ci will dominate the result. Do we want to measure downloads, or number of times a client asked if a newer version was available?

I think we ultimately want a more transparent popularity metric along the lines of download count.

jonasfj avatar Aug 28 '19 14:08 jonasfj

I don't think asking for a newer version should count as a download.

By download count, I meant to measure the downloads made by different projects using the pub tool.

However, for a reference, can we check out how npm defines their download count?

ShivamArora avatar Aug 30 '19 05:08 ShivamArora

By download count, I meant to measure the downloads made by different projects using the pub tool.

I only download a package to my machine once, then it's in my ~/.pub-cache/, no matter how many projects I used said package in.

However, if my travis setup isn't configured to cache it can download the package for every single test run.

jonasfj avatar Sep 01 '19 09:09 jonasfj

I only download a package to my machine once, then it's in my ~/.pub-cache/, no matter how many projects I used said package in.

The same happens for npm packages as well as they are stored in node_modules.

It would be worth if we can check out how the npm repository is measuring the download count of packages.

ShivamArora avatar Sep 06 '19 05:09 ShivamArora

It would be worth if we can check out how the npm repository is measuring the download count of packages.

https://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts

The main gist:

npm’s download stats are naïve by design: they are simply a count of the number of HTTP 200 responses we served that were tarball files, i.e. packages. This means the number includes:

  • automated build servers
  • downloads by mirrors
  • robots that download every package for analysis

[...]

So the count of “downloads” is much larger than the number of people who typed “npm install yourpackage” on any given day.

[...]

Bottom line: most packages get a trickle of downloads every day, and that’s not necessarily indicative that they’re being actively used. Only if your package is getting > 50 downloads/day can you be sure you’re seeing signal instead of noise. You will also get a burst of downloads whenever you publish a new package or a version of that package, because all the mirrors will download it.

isoos avatar Sep 06 '19 07:09 isoos

@isoos Do you have anything in mind regarding what could be the best approach to measure the download count?

ShivamArora avatar Sep 17 '19 07:09 ShivamArora

Do you have anything in mind regarding what could be the best approach to measure the download count?

@ShivamArora: I don't think there is a good approach to it, by its nature, it cannot be solved correctly. Our current popularity metric is a proxy of the total download count. A lot of thought went into it, e.g. how to filter CI systems automatically, how to balance the long-tail distribution, and I believe it is much closer to the truth than the "feel-good" download counts.

Having said that, I think it can be a valid goal to expose a few more metrics that are also a proxy for popularity, because maybe a single metric won't ever solve it:

  • the "feel-good" / "raw" download counts of the packages
  • the number of visits on the package pages
  • the number of API calls that requested data on a package (especially after these APIs are stable and public)
  • search relevance / click-through-rate for a package (+ metrics on most searched-after expressions related to a package)

We are also planning to introduce some kind of community rating that also provides feedback about the quality and/or use of the package.

Note: we don't have any strict schedule for the above, and can't commit to it, but really open to ideas related to these.

isoos avatar Sep 17 '19 08:09 isoos

@isoos Cool.

I agree with the point that we should expose a few more metrics.

Having the feature of community rating would be pretty good. But don't limit it just to rating include some reviews as well.

Community Ratings and Reviews would be helpful for anyone to know about the quality of the package and what others feel about the package after using it.

Apart from that I would suggest another metric, if that's possible, which should provide information about the Documentation Quality.

Since good documentation is a lot more important than just using the package.

ShivamArora avatar Sep 18 '19 06:09 ShivamArora

@isoos

Our current popularity metric is a proxy of the total download count. A lot of thought went into it, e.g. how to filter CI systems automatically, how to balance the long-tail distribution, and I believe it is much closer to the truth than the "feel-good" download counts.

As a publisher I'm not certain what 'truth' you are talking about, as the current '0-100' popularity gives me no useful information. Whilst I agree that it would be nice to filter out CI/bots etc its probably not a reality. As such a raw count actually is probably the best approach.

Raw counts will give a sense of scale. The popularity score gives no sense of scale. If I've only got 10 users then its probably time to dump the project. If there is 10K users then I'm been appreciated and its worth continuing the work :)

It is particularly unhelpful that the current metric is undocumented. Is the range linear or exponential? Is it somehow relative to other projects?

I have a feeling that the team tried to be too clever and we have ended up with something far worse than the raw numbers.

bsutton avatar Jan 22 '20 12:01 bsutton

This is actively being considered. We're looking into options for how to get a representative count of downloads / pub gets, even for cases where the package is still in the local cache.

mit-mit avatar Apr 24 '20 12:04 mit-mit

@mit-mit 👍

bsutton avatar Apr 24 '20 22:04 bsutton

I don't know how many authors are publishing prerelease versions but seeing download numbers for them could help authors get a sense of the number of users that helping to test out these versions. This is turn could assist with making a decision on if it's ready for a stable release

MaikuB avatar Jul 02 '20 14:07 MaikuB

@mit-mit I think that when a user runs pub get, pub can check the differences from the previous pubspec.lock (what packages have been added/deleted) and then push those changes to the server, which applies them to the global count of usages for that package. That still doesn't solve the Travis CI issue, but it does solve the cache issue.

I know you guys have looked at NPM. https://crates.io (Rust's central package repository) also provides download stats, so it might be worth checking out how they do it.

AKushWarrior avatar Aug 20 '20 00:08 AKushWarrior

To follow up on what I had mentioned, I had changes for a plugin that was in prerelease for a number of months that was then promoted to stable having figured it had been given enough time for the community to help test and give feedback. A couple of weeks later a bug has been found. IAs this is a plugin that's used by lot of the community, at least based on what I can infer from pub.dev and GitHub, I can only assume the prereleases hadn't gotten much usage to begin with. If there is some concern around displaying download counts publicly, perhaps consider having it visible only to the authors of the plugin. Though part of what would help in this particularly is to know the amount of downloads for a particular version. An example that does this is https://www.nuget.org/

MaikuB avatar Oct 18 '20 13:10 MaikuB

If it's feasible, I have a use case where it would be very helpful to know why a package was installed, i.e. whether it was a direct dependency or transitive through package x (in percentages).

Jjagg avatar Dec 13 '20 01:12 Jjagg

+1 to more straightforward metrics.

In the recent user study, users used the popularity score and Pub Points for decision making (e.g., skip packages with <80% popularity), but they didn't necessary understood the scores when using them.

jayoung-lee avatar Dec 22 '20 01:12 jayoung-lee

+1 to more straightforward metrics.

In the recent user study, users used the popularity score and Pub Points for decision making (e.g., skip packages with <80% popularity), but they didn't necessary understood the scores when using them.

Well, nobody understands the scores, because they're not transparent. I don't see the issue with just exposing the raw metrics: that's both more useful and more informative.

AKushWarrior avatar Dec 23 '20 05:12 AKushWarrior

Is there any news or ETA about it? Thanks!

pichillilorenzo avatar Mar 25 '21 11:03 pichillilorenzo

No - we don't have an ETA for this yet - but it is something we want to get to do.

sigurdm avatar Mar 30 '21 13:03 sigurdm

Any updates on when this can be added as a feature? Even if it's just an API like npm has and not shown on the package page, it's really useful to all of us to know how a package's developer base is growing (or shrinking!)

maxxfrazer avatar Feb 09 '22 10:02 maxxfrazer

This is actively being worked on; we're adding various telemetry to enable this. But it will take a while before this is complete enough that we can start displaying it.

mit-mit avatar Feb 09 '22 11:02 mit-mit

hurry up bro,

gsmental avatar Jun 13 '22 04:06 gsmental

hurry up bro,

steady on, these devs don't owe you or me anything… respectfully show your interest in this feature and move on.

maxxfrazer avatar Jun 13 '22 05:06 maxxfrazer

If the bot is what we are worried about for stats such as download count and all then at least only author should have metric option to see how package is doing. I don't think author will create bot see him self getting lot of download etc.

Thanks dart teams

tashi146 avatar Jun 17 '22 07:06 tashi146

I don't think author will create bot see him self getting lot of download etc.

Why not? The downloads wrapping is a very popular cheating. The more downloads, the more often users choose your package, the more popular you become, and the more investments you get.

I wish good luck to the team to find a solution.

xr0master avatar Feb 07 '23 13:02 xr0master

Any progress on this?

alexobviously avatar Mar 13 '23 12:03 alexobviously

Checking if we have any update on the dart package statitics. 💥

ishaileshmishra avatar Apr 01 '23 16:04 ishaileshmishra

@mit-mit is this something still on the near-term roadmap?

maxxfrazer avatar Apr 02 '23 14:04 maxxfrazer

+1

incrediblezayed avatar May 18 '23 09:05 incrediblezayed

+1

therdm avatar Jun 06 '23 17:06 therdm