scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

pypi `PKG_INFO` returns two packages

Open AyanSinhaMahapatra opened this issue 3 years ago • 2 comments

Description

Scanning pip-22.0.4 returns two PackageData and eventually two top level packages for the same PKG-INFO file at pip-22.0.4/src/pip.egg-info/PKG-INFO.

These have the following datasource IDs pypi_sdist_pkginfo and pypi_editable_egg_pkginfo, see https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json#L973 and https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json#L754.

These has common path patterns. The patterns should be made more strict to return only one package-data per PKG-INFO always.

AyanSinhaMahapatra avatar May 04 '22 14:05 AyanSinhaMahapatra

This has been fixed in 31.0.0b4. closing now

pombredanne avatar May 11 '22 07:05 pombredanne

This hasn't been solved btw, (I reported a different issue form the duplicated dependencies issue) I think.

See the "path": "src/pip.egg-info/PKG-INFO" here in this test data https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json, there are 2 PackageData objects that are returned at the file level package_data because this file satisfies two is_datafile() functions for two respective data sources:

  • pypi_editable_egg_pkginfo: PythonEditableInstallationPkgInfoFile has path_paterns: ('*.egg-info/PKG-INFO',)
  • pypi_sdist_pkginfo: PythonSdistPkgInfoFile has path_patterns: ('*/PKG-INFO',)

Both path patterns satisfy "src/pip.egg-info/PKG-INFO" so two PackageData objects are returned.

There's also the pypi_egg_pkginfo: PythonEggPkgInfoFile has path_patterns: ('*/EGG-INFO/PKG-INFO',).

So if the file path was "src/EGG-INFO/PKG-INFO" it would have also returned two PackageData objects for this file: one with datasource pypi_egg_pkginfo and another with pypi_sdist_pkginfo.

Do need to update PythonSdistPkgInfoFile.is_datafile() to return False if either PythonEggPkgInfoFile.is_datafile() or PythonEditableInstallationPkgInfoFile.is_datafile() is True.

AyanSinhaMahapatra avatar May 11 '22 15:05 AyanSinhaMahapatra

This is fixed, closing.

AyanSinhaMahapatra avatar Aug 11 '22 12:08 AyanSinhaMahapatra