packaging
packaging copied to clipboard
Normalisation and Requirement's `str()`
If packaging doesn't normalise dependency names when stringifying dependencies, why does it normalise their extras?
Originally posted by @layday in https://github.com/pypa/build/issues/550#issuecomment-1345189462
Build's test suite is broken because packaging 22 is normalizing the extras when producing strings. But it doesn't do this with dependency names. It seems like it should be consistent.
This is actually consistent in the face of https://peps.python.org/pep-0685/ requiring we normalize extras, but nothing requiring we normalize project names. We could potentially normalize project names as well, but that might break code that wants the original string.
@pradyunsg opinions?
The PEP requires the names to be normalized when comparing, it doesn't say they have to be changed to a normalized string by packaging, though, as far as I can tell.
Maybe this:
For tools writing core metadata, they MUST write out extra names in their normalized form.
This seems problematic in the short term, though, as previous versions don't consider these extra matching. I think it would be better to provide a transition period where extras can be written as specified, and comparison is normalized, before also normalizing the writing. Projects that historically have a a_b
extra will suddenly break if the extra gets rewritten to a-b (which I've seen happen when projects were moving to hatchling, by the way - see https://github.com/wntrblm/nox/pull/659#issuecomment-1272325085 , where tox_to_nox
was being rewritten to tox-to-nox
, breaking all workflows with nox[tox_to_nox]` until comparisons are supported in pip and older pips are not in use. Hatchling has an emergency "allow-ambiguous-features" switch that we needed to use to make sure users were not broken).
The latest version of Requirement
implements __eq__
but this isn't very useful if the components (the name and extras) aren't normalised for comparison:
In [1]: from packaging.requirements import Requirement
In [2]: Requirement('foo') == Requirement('foo')
Out[2]: True
In [3]: Requirement('foo') == Requirement('Foo')
Out[3]: False
In [4]: Requirement('foo[foo_bar]') == Requirement('foo[foo_bar]')
Out[4]: True
In [5]: Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
Out[5]: False
Normalising the marker does appear to be required by the PEP. I think it'd be beneficial to review how requirements are represented and compared, e.g.:
-
__str__
should retain the original representation; -
__eq__
should normalise all components; - a new property
normalized_str
could produce a normalised representation of the requirement - pipe it back into the constructor to get a requirement with all constituent members normalised.
I believe this change might be related to https://github.com/pypa/pip/issues/11649 ? Specifically, the latest jupyterlab-server
from six days ago is now requesting a normalized extra name jsonschema[format-nongpl]
while packages built with older packaging are still requesting jsonschema[format_nongpl]
, causing a collision.
@pradyunsg opinions?
I think we should preserve them when we load them and represent them as-is; but compare them with normalisation. Basically, what @layday said. :)
I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)
I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)
I was thinking about this today and I think there's a risk with the change, but one I think we an live with (i.e. I would be fine with a PR that changes the string representation to preserve the original string). Essentially it sucks when comparisons are non-obvious; the string you see will not be the string that comparisons are being made against. So we just need to be prepared to deal with that usability concern.
Hopefully as PEP 685 gets adopted this will become less of an issue.
We could also add a method that gives you a normalised form of the requirement. Unlike versions, this isn’t defined directly by the PEPs but we have clearly defined semantics for this?
but we have clearly defined semantics for this
Does this mean to normalise the name, extras, versions in specifiers? Maybe additionally sort the extras and specifiers?
Yes, pretty much. Get the requirement into a string that can be compared with the same semantics as a requirement equality.
Upon reading https://packaging.python.org/en/latest/specifications/core-metadata/#name:
For comparison purposes, the names should be normalized before comparing.
However,
>>> Requirement("A.B-C-D") == Requirement("A.B-C_D")
False
whereas
>>> pkg_resources.Requirement.parse("A.B-C-D") == pkg_resources.Requirement.parse("A.B-C_D")
True
Is this intended?
Is this intended?
I think this issue is about trying to come with a good solution to this to rectify that kind of problem.
I’d say preserving the original format for str()
but normalising for hash()
and ==
makes the most sense, similar to how case-preserving filesystems compare paths. And it probably wouldn’t be too disruptive since I’m not sure it makes too much practice sense for non-normalised forms to compare differently.
I'm personally happy with making equality handle the normalization and not care about the string representation. And if we construct a tuple for everything in the end then we can reuse that tuple for hashing.
Opened a pull request https://github.com/pypa/packaging/pull/696 let me know what you think.
IIUC #696 only canonicalises the name - shouldn't extras be canonicalised as well for comparison?
>>> Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
False
The originally reported issue hasn't been addressed either - extra markers are canonicalised when the requirement is stringified but names aren't (nor are their extras):
>>> str(Requirement('scikit_learn; extra == "foo_bar"'))
'scikit_learn; extra == "foo-bar"'
Can we reopen this?
@layday reopened.