hackage-server icon indicating copy to clipboard operation
hackage-server copied to clipboard

Downloads and statistics API

Open yamadapc opened this issue 9 years ago • 8 comments
trafficstars

Building-up from "Proposed Statistics features" I'd like to have an issue about exposing an API for download stats over time.

Depending on how that task went, someone (or myself) could take more work.

I'd like to be assigned to expose that, along with #332. Not all the statistics proposed, but an API for downloads over time, per package, per version.

To be honest, I don't understand why /packages/downloads requires admin access.

I briefly discussed this on #hackage but not as deeply as I'd like. So if this comes as nonsensical, feel free to close and ignore. In my mind, I'd like to have JSON resources for:

  • /packages/:package_name (with a downloads and downloads_this_month count)
  • /packages/:package_name/downloads (with a "query-able" count over time)

I'm not sure if I follow what goes with the /packages/top resource. What is the criteria for a package to be considered a "top" package? Nº of downloads, I guess, but I mean how many?

I've started scrapping data from that resource on hackage-downloads. The next step would be to add a web-service for serving counts over time and just hitting the resource every day or so.

But it'd be nice to have this on the Hackage API. I think NPM has an interesting implementation of this; it's very ad-hoc, like this repository linked above.

yamadapc avatar Dec 24 '15 22:12 yamadapc

Yes I'd very much like to expose download stats in a convenient form. We collect quite a bit of detail in the download stats feature but not a lot is exposed yet.

@yamadapc you're most welcome to have a go at this. Ask in #hackage if you want advice.

dcoutts avatar May 15 '16 13:05 dcoutts

Updating this.

Since there seems to be a fastly CDN in front of Hackage now, exposing the stats through the API isn't worth the trouble (since they'll be incorrect).

Instead, it gets closer to what NPM seems to do by generating the download stats data from the CDN's logs.

It seems there're ways to enable log aggregation from fastly as outlined in:

  • https://docs.fastly.com/guides/streaming-logs/custom-log-formats
  • https://docs.fastly.com/api/logging

Another system, like NPM's download-counts would then run every scheduled time, parse the logs and generate statistics for the download counts. Depending on how the logs are structured, I'd guess, there could even be some re-use of NPM's existing tooling.

yamadapc avatar Aug 28 '16 20:08 yamadapc

I would be quite interested in this. It'd would be great to compare packages with this data. I've even got a web frontend up for a similar project (open source): Example: https://trycatchchris.co.uk/archpackagecompare/comparePackage/gnome-terminal/lxterminal/rxvt/rxvt-unicode/st/terminator/termite/xterm Code: https://github.com/chrissound/ArchPackageCompareStats

I'm not too familiar with all these logging services though - are these all paid services?

chrissound avatar May 20 '17 11:05 chrissound

fwiw, the plan is to inject the CDN download count data into Hackage so it can provide more reliably counts again; I just need to finish the S3-logic to reliably fetch and aggregate the daily data.

hvr avatar May 20 '17 11:05 hvr

We have a plan to tackle the CDN issue so that shouldn't be a blocker on this.

gbaz avatar Mar 23 '18 14:03 gbaz

:+1: /packages/top orders packages by number of downloads in the past 30 days, but it'd be nice even if it supported a query parameter to toggle all-time downloads.

I'm writing a script to run on the X most popular packages (by number of downloads), and /packages/top/ provides a decent proxy, but it'd be better to work off the all-time stats.

(also, /packages/top doesn't have a JSON option, so I'm having to scrape HTML right now. Can this endpoint also allow returning JSON format?)

brandonchinn178 avatar Nov 22 '21 23:11 brandonchinn178