pypi `PKG_INFO` returns two packages
Description
Scanning pip-22.0.4 returns two PackageData and eventually two top level packages for the same PKG-INFO file at pip-22.0.4/src/pip.egg-info/PKG-INFO.
These have the following datasource IDs pypi_sdist_pkginfo and pypi_editable_egg_pkginfo, see https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json#L973 and https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json#L754.
These has common path patterns. The patterns should be made more strict to return only one package-data per PKG-INFO always.
This has been fixed in 31.0.0b4. closing now
This hasn't been solved btw, (I reported a different issue form the duplicated dependencies issue) I think.
See the "path": "src/pip.egg-info/PKG-INFO" here in this test data https://github.com/nexB/scancode-toolkit/blob/develop/tests/packagedcode/data/pypi/source-package/pip-22.0.4-pypi-package-expected.json, there are 2 PackageData objects that are returned at the file level package_data because this file satisfies two is_datafile() functions for two respective data sources:
pypi_editable_egg_pkginfo: PythonEditableInstallationPkgInfoFile haspath_paterns: ('*.egg-info/PKG-INFO',)pypi_sdist_pkginfo: PythonSdistPkgInfoFile haspath_patterns:('*/PKG-INFO',)
Both path patterns satisfy "src/pip.egg-info/PKG-INFO" so two PackageData objects are returned.
There's also the pypi_egg_pkginfo: PythonEggPkgInfoFile has path_patterns: ('*/EGG-INFO/PKG-INFO',).
So if the file path was "src/EGG-INFO/PKG-INFO" it would have also returned two PackageData objects for this file: one with datasource pypi_egg_pkginfo and another with pypi_sdist_pkginfo.
Do need to update PythonSdistPkgInfoFile.is_datafile() to return False if either PythonEggPkgInfoFile.is_datafile() or PythonEditableInstallationPkgInfoFile.is_datafile() is True.
This is fixed, closing.