pip-audit icon indicating copy to clipboard operation
pip-audit copied to clipboard

More audit information: package age

Open jenstroeger opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe.

Not a problem per se, more of a nice-to-have: in addition to the current audit information I think it would be useful to be alerted of “older” packages. For example, if a required package version was uploaded more than N months ago, where N could be a command-line option. Perhaps there is a more recent update available, but perhaps there isn’t in which case I’d like to be alerted that the package is potentially stale/unmaintained.

Describe the solution you'd like

For example, running pip-audit --warn-aged 12 would list packages whose metadata entry upload_time_iso_8601 is older than a year. I could then check for updates, or consider dropping the package if I suspect it’s not maintained anymore.

Describe alternatives you've considered

Manual checking repos, or writing a custom script which fetches that data.

Additional context

This is an attempt to weed out package dependencies that may cause issues, e.g. because packages have become stale.

jenstroeger avatar Aug 09 '22 08:08 jenstroeger

Thanks for the request!

I think we'll want to give some extended thought to this -- package age alone is probably not a good security signal, since lots of large packages go years without releases (pyasn1, for example, is in the top 100 and hasn't had a stable release in nearly 3 years). But maybe it's something we can expose in the JSON format, and let users make those decisions for themselves.

In the mean time, PyPI's JSON API should give you the upload time for individual releases of a package. Consuming those can be a little fiddly, since multiple distributions under the same release can have widely ranging upload times (e.g., if a macOS wheel was built a couple of days later, or forgotten and added months later).

woodruffw avatar Aug 09 '22 14:08 woodruffw

[…] package age alone is probably not a good security signal

Completely agree, which is why an opt-in command line switch may be useful. For me personally the age of a package would be an indicator to check that package and make sure that one of the package’s (deeper) dependencies hasn’t gone stale — and that’s a manual check.

jenstroeger avatar Aug 09 '22 22:08 jenstroeger

Makes sense!

Another hiccup we'll want to consider: package age may not be easily available from all dependency sources, e.g. pip-audit with no arguments, meaning "audit the current environment" rather than "pull down metadata from PyPI".

woodruffw avatar Aug 10 '22 18:08 woodruffw

[…] meaning "audit the current environment" rather than "pull down metadata from PyPI".

Valid in an offline scenario. Quick check and I think a release-date of sorts isn’t a [project] key in a package’s metadata 🤔

jenstroeger avatar Aug 10 '22 20:08 jenstroeger

I’d like to use this issue to scribble down thoughts & ideas, and to keep the conversation going. @woodruffw, unless perhaps you’d like to move the conversation to a more suitable place (e.g. the Python Ideas board).

Assuming the user is online when running pip-audit then for some/many packages the code repository should be available, e.g.

> curl -s https://pypi.org/pypi/Sphinx/5.1.1/json | jq .info.project_urls.Code
"https://github.com/sphinx-doc/sphinx"

Scraping the repository would then give us interesting data points which would help to build some form of “reputation score”. For example:

  • Date of the last commit (docs) and the commit frequency indicates whether a repository is “active”
  • The run results (docs) indicate whether the last commit passed CI
  • The number of open issues (docs), their dates and discussion frequency indicates whether a repository receives attention
  • Pull requests (docs), their dates and discussions and closing frequency indicates a code velocity, and if correlated with issues, might indicate whether a package is maintained

Now I understand that this isn’t directly related to security auditing for known vulnerabilities, but I’d argue that an unmaintained package is by definition a liability. Based on that, I’d also argue that a poorly maintained package with lower code/CI/CD quality also poses a risk.

Projects like Scorecard address this somewhat (e.g. by generating a “maintained” score) but it’s not specifically targeted for Python package audits. I’ve not noodled through more/other/related projects yet, either…

jenstroeger avatar Sep 05 '22 00:09 jenstroeger