s2fft
s2fft copied to clipboard
Collecting usage statistics and community metrics
@jasonmcewen suggested we may wish to look at options for collecting statistics about usage of s2fft (and related packages) to support potential future funding applications.
Some notes from a bit of initial research on options / tools and resources in this area:
- usagestats Python package - package allowing getting opt-in usage statistics from users of a program
- This is mainly targeted at Python CLI tools where there is a entry point to attach the prompt asking users to opt-in too, so may well not be relevant to our case.
- As it requires explicit user opt-in it is likely to give more useful data about actual users as opposed to automated installations on CI runners etc.
- Users however might find the idea of collecting usage statistics like this off-putting!
- github-repo-stats GitHub Action - can be set up as a scheduled workflow to automate collecting and generating reports from GitHub's built in traffic statistics, overcoming the limitation in the built in interface to 14 days of data.
- This looks easy to set up and doesn't require any intervention on user side.
- Only captures statistics of interactions with the GitHub repository, so only gives a partial picture as many (most?) users will install from PyPI, but still useful to get statistics around development activity.
- pypistats Python package - 'Python interface to PyPI Stats API to get aggregate download statistics on Python packages on the Python Package Index'
- Allows accessing last 180 days of PyPI download statistics.
- We could potentially set up a scheduled GitHub Actions job to download and record this data for example monthly.
- pypinfo Python package - 'pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery'
- Similar to
pypistatsbut as it is directly accessing the underlying Google BigQuery data, not limited to 180 days window. - This may be useful for extracting historical PyPI download statistics on demand as an alternative to runnning a regular job with
pypistatsas suggested above.
- Similar to
- Augur - 'a data engineering tool that makes it possible for data scientists to gather open source software community data'
- Part of the CHAOSS (Community Health Analytics in Open Source Software) project.
- Pulls in data from a range of sources and as focussed at a community rather than single repository level, can collect data across multiple linked repositories / projects.
- Has extensive data visualization, reporting and querying support.
- Getting set up looks non-trivial and it feels like this may be overkill for our purposes.