packaging Normalisation and Requirement's `str()`

If packaging doesn't normalise dependency names when stringifying dependencies, why does it normalise their extras?

Originally posted by @layday in https://github.com/pypa/build/issues/550#issuecomment-1345189462

Build's test suite is broken because packaging 22 is normalizing the extras when producing strings. But it doesn't do this with dependency names. It seems like it should be consistent.

Dec 13 '22 16:12 henryiii

This is actually consistent in the face of https://peps.python.org/pep-0685/ requiring we normalize extras, but nothing requiring we normalize project names. We could potentially normalize project names as well, but that might break code that wants the original string.

@pradyunsg opinions?

Dec 13 '22 21:12 brettcannon

The PEP requires the names to be normalized when comparing, it doesn't say they have to be changed to a normalized string by packaging, though, as far as I can tell.

Maybe this:

For tools writing core metadata, they MUST write out extra names in their normalized form.

This seems problematic in the short term, though, as previous versions don't consider these extra matching. I think it would be better to provide a transition period where extras can be written as specified, and comparison is normalized, before also normalizing the writing. Projects that historically have a a_b extra will suddenly break if the extra gets rewritten to a-b (which I've seen happen when projects were moving to hatchling, by the way - see https://github.com/wntrblm/nox/pull/659#issuecomment-1272325085 , where tox_to_nox was being rewritten to tox-to-nox, breaking all workflows with nox[tox_to_nox]` until comparisons are supported in pip and older pips are not in use. Hatchling has an emergency "allow-ambiguous-features" switch that we needed to use to make sure users were not broken).

Dec 13 '22 22:12 henryiii

The latest version of Requirement implements __eq__ but this isn't very useful if the components (the name and extras) aren't normalised for comparison:

In [1]: from packaging.requirements import Requirement

In [2]: Requirement('foo') == Requirement('foo')
Out[2]: True

In [3]: Requirement('foo') == Requirement('Foo')
Out[3]: False

In [4]: Requirement('foo[foo_bar]') == Requirement('foo[foo_bar]')
Out[4]: True

In [5]: Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
Out[5]: False

Normalising the marker does appear to be required by the PEP. I think it'd be beneficial to review how requirements are represented and compared, e.g.:

__str__ should retain the original representation;
__eq__ should normalise all components;
a new property normalized_str could produce a normalised representation of the requirement - pipe it back into the constructor to get a requirement with all constituent members normalised.

Dec 13 '22 22:12 layday

I believe this change might be related to https://github.com/pypa/pip/issues/11649 ? Specifically, the latest jupyterlab-server from six days ago is now requesting a normalized extra name jsonschema[format-nongpl] while packages built with older packaging are still requesting jsonschema[format_nongpl], causing a collision.

Dec 14 '22 19:12 henryiii

@pradyunsg opinions?

I think we should preserve them when we load them and represent them as-is; but compare them with normalisation. Basically, what @layday said. :)

Dec 17 '22 22:12 pradyunsg

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

Dec 17 '22 22:12 pradyunsg

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

I was thinking about this today and I think there's a risk with the change, but one I think we an live with (i.e. I would be fine with a PR that changes the string representation to preserve the original string). Essentially it sucks when comparisons are non-obvious; the string you see will not be the string that comparisons are being made against. So we just need to be prepared to deal with that usability concern.

Hopefully as PEP 685 gets adopted this will become less of an issue.

Dec 20 '22 00:12 brettcannon

We could also add a method that gives you a normalised form of the requirement. Unlike versions, this isn’t defined directly by the PEPs but we have clearly defined semantics for this?

Feb 13 '23 08:02 pradyunsg

but we have clearly defined semantics for this

Does this mean to normalise the name, extras, versions in specifiers? Maybe additionally sort the extras and specifiers?

Feb 13 '23 08:02 uranusjr

Yes, pretty much. Get the requirement into a string that can be compared with the same semantics as a requirement equality.

Feb 13 '23 08:02 pradyunsg

Upon reading https://packaging.python.org/en/latest/specifications/core-metadata/#name:

For comparison purposes, the names should be normalized before comparing.

However,

>>> Requirement("A.B-C-D") == Requirement("A.B-C_D")
False

whereas

>>> pkg_resources.Requirement.parse("A.B-C-D") == pkg_resources.Requirement.parse("A.B-C_D")
True

Is this intended?

May 30 '23 08:05 astrojuanlu

Is this intended?

I think this issue is about trying to come with a good solution to this to rectify that kind of problem.

May 31 '23 20:05 brettcannon

I’d say preserving the original format for str() but normalising for hash() and == makes the most sense, similar to how case-preserving filesystems compare paths. And it probably wouldn’t be too disruptive since I’m not sure it makes too much practice sense for non-normalised forms to compare differently.

Jun 01 '23 08:06 uranusjr

I'm personally happy with making equality handle the normalization and not care about the string representation. And if we construct a tuple for everything in the end then we can reuse that tuple for hashing.

Jun 01 '23 19:06 brettcannon

Opened a pull request https://github.com/pypa/packaging/pull/696 let me know what you think.

Jun 02 '23 11:06 astrojuanlu

IIUC #696 only canonicalises the name - shouldn't extras be canonicalised as well for comparison?

>>> Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
False

The originally reported issue hasn't been addressed either - extra markers are canonicalised when the requirement is stringified but names aren't (nor are their extras):

>>> str(Requirement('scikit_learn; extra == "foo_bar"'))
'scikit_learn; extra == "foo-bar"'

Can we reopen this?

Jul 07 '23 22:07 layday

@layday reopened.

Jul 07 '23 22:07 brettcannon

packaging packaging copied to clipboard

Normalisation and Requirement's `str()`

packaging
packaging copied to clipboard