purl-spec
purl-spec copied to clipboard
Clarification on how to refer to platform-specific pypi packages
A Python package sometimes differs in content/dependencies depending on things such as the targeted operating system, version of the Python interpreter (2 vs 3, for instance) and depending on which "extras" (optional features) of the package the consumer is interested in. This effectively results in different "editions" of the same package.
I assume that purl qualifiers would be used to represent these different editions, correct? For example:
pkg:pypi/[email protected]?extras=security,socks
pkg:pypi/[email protected]?python_version=2.7
and similar. The README doesn't elaborate on any special cases like these for pypi
purls, whereas for example the description of maven
purls contain quite a few examples with qualifiers.
So I guess I'd like to know if qualifiers are the natural construct to represent these python package "variations" and it would also be interesting to learn of any example use in the wild of pypi
purls?
Edit: this is not necessarily a request to update the documentation. Consider it a question.
This would be very helpful as it impacts how complete the SBOM for python packages is. At the current state, the list of dependencies for a package to audit a supply chain would not be complete without including extras as those often imply additional dependencies to be installed. It would be also highly beneficial to include other specifiers that are within the package JSON metadata such as :
- filename
- md5 (or maybe also sha256) digest
- python_version
- packagetype
- requires_python
- url
For auditing purposes I feel like it is important that there is at least one way to precisely select a package file that is/would be installed such as the mentioned MD5 checksum or filename. The pkg:pypi/requests
purl could mean something different when it's on mac vs windows or other os or a different python version.
The extras
specifier is also helpful as mentioned it changes what dependencies are installed (important for generating SBOM and auditing). As far as I am aware it does not modify the location of the package as that is extracted during installation time from the package itself (setup.py or wheel metadata), nevertheless, as mentioned it is also very helpful to have it included.
For reference, here is the package json metadata from pypi: https://pypi.org/pypi/requests/json The change itself looks quite simple and it would greatly improve usability for pypi purls. If there would be an interest I could give it a shot and draft up a pull request with those changes.
I dived more into the specs of the purl as it seems that within the specification there are few qualifiers that are valid for all package types even if not mentioned explicitly in the PyPI purldocumentation; such as download_url, file_name, and checksum, link to that section in purl spec: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#known-qualifiers-keyvalue-pairs
I guess this solves half of the issue