packaging icon indicating copy to clipboard operation
packaging copied to clipboard

Normalisation and Requirement's `str()`

Open henryiii opened this issue 2 years ago • 17 comments

If packaging doesn't normalise dependency names when stringifying dependencies, why does it normalise their extras?

Originally posted by @layday in https://github.com/pypa/build/issues/550#issuecomment-1345189462

Build's test suite is broken because packaging 22 is normalizing the extras when producing strings. But it doesn't do this with dependency names. It seems like it should be consistent.

henryiii avatar Dec 13 '22 16:12 henryiii

This is actually consistent in the face of https://peps.python.org/pep-0685/ requiring we normalize extras, but nothing requiring we normalize project names. We could potentially normalize project names as well, but that might break code that wants the original string.

@pradyunsg opinions?

brettcannon avatar Dec 13 '22 21:12 brettcannon

The PEP requires the names to be normalized when comparing, it doesn't say they have to be changed to a normalized string by packaging, though, as far as I can tell.

Maybe this:

For tools writing core metadata, they MUST write out extra names in their normalized form.

This seems problematic in the short term, though, as previous versions don't consider these extra matching. I think it would be better to provide a transition period where extras can be written as specified, and comparison is normalized, before also normalizing the writing. Projects that historically have a a_b extra will suddenly break if the extra gets rewritten to a-b (which I've seen happen when projects were moving to hatchling, by the way - see https://github.com/wntrblm/nox/pull/659#issuecomment-1272325085 , where tox_to_nox was being rewritten to tox-to-nox, breaking all workflows with nox[tox_to_nox]` until comparisons are supported in pip and older pips are not in use. Hatchling has an emergency "allow-ambiguous-features" switch that we needed to use to make sure users were not broken).

henryiii avatar Dec 13 '22 22:12 henryiii

The latest version of Requirement implements __eq__ but this isn't very useful if the components (the name and extras) aren't normalised for comparison:

In [1]: from packaging.requirements import Requirement

In [2]: Requirement('foo') == Requirement('foo')
Out[2]: True

In [3]: Requirement('foo') == Requirement('Foo')
Out[3]: False

In [4]: Requirement('foo[foo_bar]') == Requirement('foo[foo_bar]')
Out[4]: True

In [5]: Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
Out[5]: False

Normalising the marker does appear to be required by the PEP. I think it'd be beneficial to review how requirements are represented and compared, e.g.:

  • __str__ should retain the original representation;
  • __eq__ should normalise all components;
  • a new property normalized_str could produce a normalised representation of the requirement - pipe it back into the constructor to get a requirement with all constituent members normalised.

layday avatar Dec 13 '22 22:12 layday

I believe this change might be related to https://github.com/pypa/pip/issues/11649 ? Specifically, the latest jupyterlab-server from six days ago is now requesting a normalized extra name jsonschema[format-nongpl] while packages built with older packaging are still requesting jsonschema[format_nongpl], causing a collision.

henryiii avatar Dec 14 '22 19:12 henryiii

@pradyunsg opinions?

I think we should preserve them when we load them and represent them as-is; but compare them with normalisation. Basically, what @layday said. :)

pradyunsg avatar Dec 17 '22 22:12 pradyunsg

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

pradyunsg avatar Dec 17 '22 22:12 pradyunsg

I'm pretty sure a PR would be welcome (and, unless @brettcannon feels differently, likely merged subject to regular code reviews). :)

I was thinking about this today and I think there's a risk with the change, but one I think we an live with (i.e. I would be fine with a PR that changes the string representation to preserve the original string). Essentially it sucks when comparisons are non-obvious; the string you see will not be the string that comparisons are being made against. So we just need to be prepared to deal with that usability concern.

Hopefully as PEP 685 gets adopted this will become less of an issue.

brettcannon avatar Dec 20 '22 00:12 brettcannon

We could also add a method that gives you a normalised form of the requirement. Unlike versions, this isn’t defined directly by the PEPs but we have clearly defined semantics for this?

pradyunsg avatar Feb 13 '23 08:02 pradyunsg

but we have clearly defined semantics for this

Does this mean to normalise the name, extras, versions in specifiers? Maybe additionally sort the extras and specifiers?

uranusjr avatar Feb 13 '23 08:02 uranusjr

Yes, pretty much. Get the requirement into a string that can be compared with the same semantics as a requirement equality.

pradyunsg avatar Feb 13 '23 08:02 pradyunsg

Upon reading https://packaging.python.org/en/latest/specifications/core-metadata/#name:

For comparison purposes, the names should be normalized before comparing.

However,

>>> Requirement("A.B-C-D") == Requirement("A.B-C_D")
False

whereas

>>> pkg_resources.Requirement.parse("A.B-C-D") == pkg_resources.Requirement.parse("A.B-C_D")
True

Is this intended?

astrojuanlu avatar May 30 '23 08:05 astrojuanlu

Is this intended?

I think this issue is about trying to come with a good solution to this to rectify that kind of problem.

brettcannon avatar May 31 '23 20:05 brettcannon

I’d say preserving the original format for str() but normalising for hash() and == makes the most sense, similar to how case-preserving filesystems compare paths. And it probably wouldn’t be too disruptive since I’m not sure it makes too much practice sense for non-normalised forms to compare differently.

uranusjr avatar Jun 01 '23 08:06 uranusjr

I'm personally happy with making equality handle the normalization and not care about the string representation. And if we construct a tuple for everything in the end then we can reuse that tuple for hashing.

brettcannon avatar Jun 01 '23 19:06 brettcannon

Opened a pull request https://github.com/pypa/packaging/pull/696 let me know what you think.

astrojuanlu avatar Jun 02 '23 11:06 astrojuanlu

IIUC #696 only canonicalises the name - shouldn't extras be canonicalised as well for comparison?

>>> Requirement('foo[foo_bar]') == Requirement('foo[foo-bar]')
False

The originally reported issue hasn't been addressed either - extra markers are canonicalised when the requirement is stringified but names aren't (nor are their extras):

>>> str(Requirement('scikit_learn; extra == "foo_bar"'))
'scikit_learn; extra == "foo-bar"'

Can we reopen this?

layday avatar Jul 07 '23 22:07 layday

@layday reopened.

brettcannon avatar Jul 07 '23 22:07 brettcannon