FawltyDeps icon indicating copy to clipboard operation
FawltyDeps copied to clipboard

Analyze PyPI packages with FawltyDeps

Open mknorps opened this issue 1 year ago • 1 comments

We would like to conduct research on PyPI packages using FawltyDeps to see how often a problem of undeclared and unused dependencies is present in the publicly available packages.

Research questions

There are many questions one may ask, and among them, it would be interesting to explore:

  1. Does FawltyDeps bring new information about the missing and unused dependencies or are the findings more related for FawltyDeps shortcomings?
  2. Out of the most commonly used packages, is there any that would have a critical issue - either not being able to run under some circumstances or having an excessive number of packages declared?
  3. How the outcome changes for a representative sample from all PyPI packages?
  4. Is there a change in results depending on the type of project (Data Science, general, tooling, etc)
  5. What are the most common problems in dependencies and imports relations?
  6. Is there a lot of packages using namespace packages, which FawltyDeps does not treat separately for now?

Please write more questions in the comments.

How to start

PyPI has an API where metadata about packages is available. FawltyDeps is a static analyzer, so for it to run, it is enough to collect the codebase of the explored package from the PyPI (see example files). FawltyDeps is able to analyze projects within their virtual environments. The costly step will be to create those virtual environments. We may think of adding a user-defined mapping to FawltyDeps options for this experiment's purpose.

mknorps avatar Mar 08 '23 17:03 mknorps