setuptools
setuptools copied to clipboard
[FR] Implement PEP 625 - File Name of a Source Distribution
What's the problem this feature will solve?
Conform to accepted standards, make it possible to reliably determine a project's (canonical form) name and version from the source distribution filename.
Describe the solution you'd like
See https://peps.python.org/pep-0625/
When creating sdist files, normalise the project name and version parts according to the specification, documented here.
Alternative Solutions
Continue as at present, which will leave sdist consumers with no reliable way of knowing the filename and version of a sdist short of either extracting the metadata from the sdist (if the sdist conforms to PEP 643) or actually building the distribution.
Additional context
Code that wants the project's formal name will still need to read the distribution metadata - that is understood and this specification doesn't affect that.
Code of Conduct
- [X] I agree to follow the PSF Code of Conduct
From Gentoo's standpoint, this will also help us getting predictable sdist names, as right now some PEP517 backends produce normalized filenames and others do not.
In #4302, after releasing v69.3, users are surprised by two behaviors:
- Trailing zeros are stripped.
- The filename of the sdist doesn't match other names inside the sdist.
The latter sounds like a bug. The former sounds like a surprising change implied by the spec or the implementation.
Is there better documentation on what constitutes a canonical version number? The spec is pretty silent about the trailing zeros. The packaging.utils.canonicalize_version, however, has two implementations, one which strips the zeros and the other which doesn't, switched by a boolean flag. Which is the real canonical version?
Since users are reporting that the filename is in fact not canonicalizing the version, that also sounds like a problem that wasn't fully addressed in #4286.
Also releases at PyPI are with trailing zeros.
I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary when normalising versions. Yes, when comparing versions, extra trailing zeroes are ignored, but that's not the same as normalising.
I would also expect that the name and version in the sdist and wheel filenames should be the same.
I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary
This section does say
{version}
is the canonicalized form of the project version (see Version specifiers).
And that section indicates:
See also Appendix: Parsing version strings with regular expressions which provides a regular expression to check strict conformance with the canonical format
Which leads to a function to check for is_canonical
:
import re
def is_canonical(version):
return re.match(r'^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$', version) is not None
Running that confirms that the spec considers both 1
and 1.0
to both be canonical for the same version:
@ is_canonical('1.0') and is_canonical('1')
True
Therefore, the bug is in packaging, which transforms 1.0
to 1
.
What that does imply, however, is that for a given version, it will not be possible to deterministically infer what the filename will be for that version. If the indicated version is 1.0, the filename will have "1.0" and if the indicated version is "1", the filename will have "1". There is in fact no canonical form of a version if arbitrary trailing zeros are allowed as any version could append an arbitrary trailing zero and have a still canonical and conformant but divergent manifestation.
@ is_canonical('2024.4.13.0.0.0.0.0.0.0')
True
@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0
if you already have version X
(and vice versa). For example PyPI.
Similarly, PyPI shouldn't allow (and I believe it does so) to create project A
if there is already project a
. And nobody and nothing forces you to use either A
or a
as a name for your project. So it should be okay to release version X.0
or X
as you wish/need.
@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.
It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename sampleproject-1.0-2.tar.gz
could be for:
- a project named
sampleproject
with a canonicalized version of1.post2
- a project named
sampleproject-1-0
with a canonicalized version of2
.
There are more details in https://peps.python.org/pep-0625/.
Similarly, PyPI shouldn't allow (and I believe it does so) to create project A if there is already project a. And nobody and nothing forces you to use either A or a as a name for your project. So it should be okay to release version X.0 or X as you wish/need.
We allow projects to be created with whatever capitalization they prefer (as well as separators) but the filename is normalized for them as well (i.e. will always be a
for a project named A
).
Note that this change is only for the filename, which users don't usually see -- the version displayed on PyPI can continue to be the non-canonicalized version, nothing changes there.
I think we need to reopen this, due to https://github.com/pypa/setuptools/commit/df45427cbb67c1149fcf5d2d1e2705e69b3baf0c the version is no longer being normalized, which is required per PEP 625:
version is the version of the distribution as defined in PEP 440, e.g.
20.2
, and normalised according to the rules in that PEP.
Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging
that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.
@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.
It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename
sampleproject-1.0-2.tar.gz
could be for:* a project named `sampleproject` with a canonicalized version of `1.post2` * a project named `sampleproject-1-0` with a canonicalized version of `2`.
There are more details in https://peps.python.org/pep-0625/.
PEP 625 says: The name of an sdist should be {distribution}-{version}.tar.gz
.
-
distribution
is the name of the distribution as defined in PEP 345, and normalised as described in the wheel spec -
version
is the version of the distribution as defined in PEP 440
PEP 440 says: The canonical public version identifiers MUST comply with the following scheme:
[N!]N(.N)*[{a|b|rc}N][.postN][.devN]
This means that sampleproject-1.0-2.tar.gz
is not a compliant sdist file name. PEP 625 prohibits production of such sdists.
OTOH, both sampleproject-1.0.tar.gz
and sampleproject-1.tar.gz
are canonical and valid sdist file names.
Or, do I miss something?
Or, do I miss something?
Yes, I'm talking about normalization of the version in general according to PEP 440 (which was removed in https://github.com/pypa/setuptools/commit/df45427cbb67c1149fcf5d2d1e2705e69b3baf0c), not just the trailing zeros.
Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into
packaging
that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.
There is almost a function there. Using functools.partial(packaging.utils.canonicalize_version, strip_trailing_zero=False)
should work.
We should add a test that captures a version that should be normalized but isn't.
>>> utils.canonicalize_version('1.0-2', strip_trailing_zero=False)
'1.0.post2'
In your example, as ling as the project name is known to be normalised (so it doesn't contain a hyphen character) there is no ambiguity.
I agree that I would expect versions to be normalised so that they don't contain hyphens either (and the wheel spec requires that).
Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.
Agreed. That set of rules does not include removing training .0
segments.
It's an unfortunate weirdness that the rules for normalising and the rules for comparison are such that two version strings can compare equal but normalise differently (1.0
and 1.0.0
). But it's a consequence of trying to fit so many different versioning schemes into one standard.
There is almost a function there. Using
functools.partial(packaging.utils.canonicalize_version, strip_trailing_zero=False)
should work.
Aha, I missed that that had been added. That should work fine.
IMO, a "canonicalize" function should produce a truly canonical, unambiguous version, as canonicalize_version
does by default, but unfortunately, that's not how wheel does it and thus it's surprising for users. By my understanding, "canonical" means that to equal versions are identical, which you don't get without stripping the zeros.
Note that there's also a separate problem with canonicalize_version
in that it won't fail if the value can't be canonicalized.
>>> utils.canonicalize_version('1.0-2x3', strip_trailing_zero=False)
'1.0-2x3'
But that's probably acceptable for Setuptools' case as I believe Setuptools validates that version is a valid packaging.version.Version
.
This is breaking a fair amount of our builds as well. We use many hyphens in our project names across our entire infrastructure (in Python but also other languages), and the latest setuptools is now the only link in the chain that is converting them to underscores. How can we disable this?
@ds-cbo Can you give a bit more detail about how this is breaking your builds?
@di Sure! We use FreeBSD where packages ("ports") have the same name as the upstream package, and the source blobs (regardless of language) are always expected to follow the ${PORTNAME}-${DISTVERSIONFULL}${EXTRACT_SUFX}
(eg. foo-bar-1.2.3.tar.gz
) schema. Quite similar to this proposal, except that the name isn't altered. This aligns with almost all languages currently supported:
- C's xorg-server releases as
xorg-server-21.1.13.tar.gz
- Perl's IO-Compress distributes as
IO-Compress-2.211.tar.gz
- Ruby's aws-sdk-core distributes their gems as
awk-sdk-core-3.192.0.gem
- Rust's cfg-if distributes their sources as
cfg-if-1.0.0.tar.gz
- (etc)
R (example: bliss) is one exception to this practice, since their source blobs are distributed as {name}_{version}
. But that's still a trivial fix that works for all R ports.
Python seems to be the first to break the promise of keeping the name untouched. For example: django-bleach (port Makefile) used to build to django-bleach-3.1.0.tar.gz
but will now build to django_bleach-3.1.0.tar.gz
. This will no longer follow the expected{name}-{version}
pattern and will thus fail to match.
Now, it is possible to add this new renaming logic to the general python.mk
to automatically override DISTNAME for all python ports (similar to R), but this is not backwards compatible so will break for all older sdists. The more likely resolution would be to manually update around 350 ports when they release a new sdist to follow this new scheme. It's not impossible, but also not a fun thing to maintain.
Hatchling already broke this promise before setuptools, but being "not the default" meant that its impact was much smaller. Adding an exception to the source file name was part of adding the exception for hatchling instead of setuptools
I can imagine (but didn't check) that other distro's and their maintainers will have a similar issue as we do on FreeBSD. Otherwise I'd love to hear their approach to working with this change.
I can imagine (but didn't check) that other distro's and their maintainers will have a similar issue as we do on FreeBSD. Otherwise I'd love to hear their approach to working with this change.
Other distros already have complained that setuptools still didn't adapt new Python standards, and they actually had to disable name canonicalization for packages using setuptools.
Python has two concepts of a project name. The "display name" (which is the project's choice, and which is what is stored in the project metadata and should be used when displaying to the user) and the "normalised name" (which is the one used for comparison, and for use in places like filenames and URLs). The normalised name enforces certain rules such as never containing hyphens, so that (for example) the name and version parts of a filename can be identified by splitting on a separated hyphen. I'd have expected distros like FreeBSD to use the normalised name in filenames (but obviously that's just my uninformed opinion).
This split between normalised and unnormalised forms is at least in part because Python's version standard allows for a far wider range of version strings than the simple x.y.z. So we can't simply say "the version is everything after the last hyphen" because versions can contain hyphens, and we can't say "the name is everything before the first hyphen" because names can contain hyphens. Normalisation is the only practical way of ensuring deterministic parsing rules.
Yes, this causes rough edges when interfacing with other ecosystems that have different conventions and different rules. It's a matter of compromise in thise situations.
[...] and the "normalised name" (which is the one used for comparison, and for use in places like filenames and URLs).
A different normalisation is done for PyPI URLs/slugs, which are hyphenated. There's actually three.
For reference, I've worked around this by putting this line in all python Makefiles relevant to us:
DISTNAME= ${PORTNAME:S/-/_/g}-${PORTVERSION}
We luckily don't depend on any packages with dots in their name, so this does the job for now.
Also paging @sunpoet who seems to be the core Python maintainer for FreeBSD.
A different normalisation is done for PyPI URLs/slugs, which are hyphenated. There's actually three.
@layday What URLs/slugs you referring to?
I assume things like https://pypi.org/project/pykg-config/ - PEP 503 defines normalisation to use hyphens. The wheel and sdist specs are different, but they basically come down to "normalise like PEP 503 but then replace hyphens with underscores".
The details are messy, somewhat because of historical constraints (PyPI was using hyphenated names long before we had the standards, and URL stability is important...) I was oversimplifying in my comment, because the details aren't that relevant here.
I was not able to predict the irrelevance. Apologies for the noise.
FWIW, Buildout currently cannot install source distributions with underscores. When a wheel is available, installation still works, at least until the wheel package starts creating normalised distribution names as well. See my issue report at https://github.com/buildout/buildout/issues/647
That needs to be fixed in the Buildout project. I suspect that installation actually works, but that Buildout does not see the new package because it is looking for the wrong name.
Hi -
I'm looking as much as I can but I am not seeing where this change dictates that the filename generated by sdist must be all lower case, if that's correct. Observing doing python setup.py sdist
with SQLAlchemy under setuptools 69.2.0 yields:
SQLAlchemy-2.0.31.dev0.tar.gz
whereas under 69.3.0 it yields:
sqlalchemy-2.0.31.dev0.tar.gz
I don't care what the casing is personally, however I'm about to do a release on pypi and I'm extremely concerned about automated systems / distribution scripts etc. that will be broken by this change.
For example, Fedora's python-sqlalchemy.spec file will break with this change. now they can fix it of course but this is something I expect to see happening all over the place: https://src.fedoraproject.org/rpms/python-sqlalchemy/blob/rawhide/f/python-sqlalchemy.spec
I'm looking as much as I can but I am not seeing where this change dictates that the filename generated by sdist must be all lower case, if that's correct.
The file name of a sdist was standardised in PEP 625. The file name must be in the form {name}-{version}.tar.gz, where {name} is normalised according to the same rules as for binary distributions (see Binary distribution format)
In distribution names, any run of -_. characters (HYPHEN-MINUS, LOW LINE and FULL STOP) should be replaced with _ (LOW LINE), and uppercase characters should be replaced with corresponding lowercase ones.
(highlight mine)