ort icon indicating copy to clipboard operation
ort copied to clipboard

Improve resolution of Python / PIP dependencies

Open sschuberth opened this issue 4 years ago • 12 comments

ORT's analyzer has various problems with resolving Python / PIP dependencies

  • [ ] Dependencies on native packages require native system stool to be installed, see #4578.
  • [ ] The Python-3-compatibility check might fail, see #4289.
  • [ ] Any specifically requested Python version is not adhered to, see #3671.
  • [ ] We use some rather obscure helper scripts based on abandoned projects, see #2816.
  • [ ] We might have some general problems with retrieving metadata:
    • [ ] #812
    • [x] #509
    • [ ] #485
    • [x] #5159

sschuberth avatar Oct 28 '21 13:10 sschuberth

Possible solution to the above include @pombredanne's proposal for an ACT-funded "Project-Multi Python-version dependencies resolver", or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.

sschuberth avatar Oct 28 '21 14:10 sschuberth

or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.

See in particular https://github.com/ddelange/pipgrip/issues/40.

sschuberth avatar Nov 02 '21 13:11 sschuberth

Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to

Finds native dependencies for high level languages like Python

sschuberth avatar Nov 03 '21 15:11 sschuberth

@sschuberth re:

Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to

Finds native dependencies for high level languages like Python

From a quick look they seem to:

  1. create a docker image in https://github.com/trailofbits/it-depends/blob/8f8988330239c6d3eb39f05988fdbe6802f4bbbe/it_depends/pip.py#L35
  2. run pip directly https://github.com/trailofbits/it-depends/blob/8f8988330239c6d3eb39f05988fdbe6802f4bbbe/it_depends/pip.py#L176 or through https://github.com/wimglenn/johnnydep/blob/master/johnnydep/pipper.py

pombredanne avatar Nov 03 '21 16:11 pombredanne

We could also take a deeper look at component-detection's approach for PIP.

sschuberth avatar Jan 04 '22 17:01 sschuberth

Some interesting insights on the general topic from a Python maintainer, and a possible solution.

sschuberth avatar Apr 11 '22 10:04 sschuberth

And yet another interesting discussion with links to:

  • https://github.com/spack/spack
  • https://github.com/thoth-station/solver
  • https://github.com/pypa/pip/pull/10748

sschuberth avatar Apr 11 '22 12:04 sschuberth

@sschuberth FWIW, ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more and has what is likely the best requirements parser around https://github.com/nexB/pip-requirements-parser also used in CycloneDX. You can see the code in action in https://github.com/nexB/scancode-toolkit/blob/syspacfiles/src/packagedcode/pypi.py We also parse various Python metadata files and detect packages in various installed, archive and extracted layouts. We maintain https://github.com/nexB/dparse2 and https://github.com/nexB/pkginfo2 for additional manifest formats and https://github.com/nexB/univers to parse all versions including all Python package versions. We also built utilities to resolve, collect and download actual package archives based on these. And we are continuously adding support for new formats as they come.

pombredanne avatar Apr 12 '22 07:04 pombredanne

ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more

Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?

sschuberth avatar Apr 12 '22 07:04 sschuberth

Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?

By parse I mean collecting the data as they are and found locally without making any network call, e.g. this means:

  • parsing and normalizing actual package manifests (and of course all the declared data there such as licenses)
  • extracting direct dependencies constraints from manifests,
  • extracting resolved dependency versions from lockfiles,
  • collecting any extra data available from lockfiles (some formats have more data in their lockfiles, like newer npm lockfiles or PHP composer may contains declared license info).

This does not mean resolving dependencies and getting extra data for these dependencies yet: for Python and PyPI proper that's been the essence of the proposal I had put forward to the ACT project.

Now this will eventually happen as all parts are mostly in place now:

  • ScanCode collects all the explicit dependencies
  • Univers knows how to parse and make sense of most package version, version constraints and version ranges and how to resolve and evaluate versions constraints to concrete versions given ranges.
  • VulnerableCode and FetchCode both know how to get the list of versions for a package by querying upstream registries APIs.
  • FetchCode knows how to fetch actual package metadata from these API and also fetch the code.

The last step will be to bring these together: as it is, this could already be used to resolve transitive dependencies using a simple strategy such as getting the latest version. It would later benefit from adding extra version resolvers to emulate the behaviour of package managers such the pip solver (this was the ACT proposal), the pubgrub solver, the maven solver, etc.

pombredanne avatar Apr 12 '22 08:04 pombredanne

See also: https://github.com/oss-review-toolkit/ort/issues/3671#issuecomment-1203248523

Some updates that are likely relevant here: https://github.com/nexB/python-inspector is now out and has been designed specifically to be integrated in ort and resolve pip dependencies without having the constraints of running pip. And see https://github.com/nexB/ort/pull/1 for the working ort integration that we are refining there first before submitting to ort proper

python-inspector does resolve transitive dependencies.

pombredanne avatar Aug 02 '22 21:08 pombredanne