s2fft icon indicating copy to clipboard operation
s2fft copied to clipboard

Collecting usage statistics and community metrics

Open matt-graham opened this issue 1 year ago • 0 comments

@jasonmcewen suggested we may wish to look at options for collecting statistics about usage of s2fft (and related packages) to support potential future funding applications.

Some notes from a bit of initial research on options / tools and resources in this area:

  • usagestats Python package - package allowing getting opt-in usage statistics from users of a program
    • This is mainly targeted at Python CLI tools where there is a entry point to attach the prompt asking users to opt-in too, so may well not be relevant to our case.
    • As it requires explicit user opt-in it is likely to give more useful data about actual users as opposed to automated installations on CI runners etc.
    • Users however might find the idea of collecting usage statistics like this off-putting!
  • github-repo-stats GitHub Action - can be set up as a scheduled workflow to automate collecting and generating reports from GitHub's built in traffic statistics, overcoming the limitation in the built in interface to 14 days of data.
    • This looks easy to set up and doesn't require any intervention on user side.
    • Only captures statistics of interactions with the GitHub repository, so only gives a partial picture as many (most?) users will install from PyPI, but still useful to get statistics around development activity.
  • pypistats Python package - 'Python interface to PyPI Stats API to get aggregate download statistics on Python packages on the Python Package Index'
    • Allows accessing last 180 days of PyPI download statistics.
    • We could potentially set up a scheduled GitHub Actions job to download and record this data for example monthly.
  • pypinfo Python package - 'pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery'
    • Similar to pypistats but as it is directly accessing the underlying Google BigQuery data, not limited to 180 days window.
    • This may be useful for extracting historical PyPI download statistics on demand as an alternative to runnning a regular job with pypistats as suggested above.
  • Augur - 'a data engineering tool that makes it possible for data scientists to gather open source software community data'
    • Part of the CHAOSS (Community Health Analytics in Open Source Software) project.
    • Pulls in data from a range of sources and as focussed at a community rather than single repository level, can collect data across multiple linked repositories / projects.
    • Has extensive data visualization, reporting and querying support.
    • Getting set up looks non-trivial and it feels like this may be overkill for our purposes.

matt-graham avatar Sep 20 '24 14:09 matt-graham