packaging.python.org
packaging.python.org copied to clipboard
Non-normative guidance on sdist filenames?
This is a copy of what I wrote in https://github.com/pypa/packaging.python.org/pull/1066#issuecomment-1116728314, since it's off-topic there, in reference to the current source distribution filename format guidelines (link):
The file name of a sdist is not currently standardised, although the de facto form is {name}-{version}.tar.gz, where {name} is the canonicalized form of the project name (see PEP 503 for the canonicalization rules) with - characters replaced with _, and {version} is the canonicalized form of the project version (see Version specifiers).
Does PyPI or any other PEP 503-compliant host currently do the - to _ normalization? The standard here suggests that foo-bar==1.2.3 should be normalized to foo_bar-1.2.3.tar.gz, but here's how PyPI's simple index supplies pip-audit's source distributions (link):
pip-audit-2.1.0.tar.gz pip-audit-2.1.1.tar.gz pip-audit-2.2.0.tar.gz pip-audit-2.2.1.tar.gz (The current behavior is clearly working since pip has a workaround for the "vexing parse" case in https://github.com/pypa/packaging/issues/527, but I wonder if it makes sense to amend this documentation again to emphasize that the normalization suggested currently isn't practiced.)
As I read it the normalisation should be performed by the tool creating the sdist. If it isn't being normalised that's a bug in the build backend.
I'm not sure the PEP 503 repository should be modifying uploaded files, even to fix this normalisation issue, as then the hash won't match the file the user uploaded.
I'm not sure the PEP 503 repository should be modifying uploaded files, even to fix this normalisation issue, as then the hash won't match the file the user uploaded.
The problem here is the distribution filename, not the distribution's name within its own metadata. The former isn't included in the hash and doesn't require any file modification at all, since it's just the name that appears in the simple index (which is then used to retrieve the contents and hash them on the end user side).
IMO, it's actually too late for this to be fixed on either the client or index side 🙁 -- instead we should simply update the guidance (in this repository) to indicate that the - to _ normalization is not consistently performed, and that a future PEP (like PEP 625) will address this by specifying a new filename format entirely.
IMO, it's actually too late for this to be fixed on either the client or index side
We might be able to rename existing files on PyPI, I would have to think through the implications, but it's technically possible.