packaging-problems icon indicating copy to clipboard operation
packaging-problems copied to clipboard

Underscore (_) and (-) normalization

Open pganssle opened this issue 6 years ago • 8 comments

According to this:

Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal. For example, if you register a project named cool-stuff, users will be able to download it or declare a dependency on it using any of the following spellings:

But I cannot find the part about treating underscores, hyphens and periods as equal anywhere in the standards. It appears that this was supposed to be addressed in some way according to #17, but I didn't necessarily want to hijack that thread, because I don't understand the context.

The problem I see is that the existing tools don't seem to be consistently doing the same thing, which causes minor compatibility problems. A related (but I now realize slightly different) problem is that apparently distutils normalizes hyphens to underscores in extras, but pip install .[extra-a] and pip install .[extra_a] are not considered equivalent. This is definitely a bug, but it's not clear where to fix it.

Is there an official spec for this? Does/should it apply to extras? Where should the normalization take place?

pganssle avatar Dec 19 '18 18:12 pganssle

PEP 503 says:

This PEP references the concept of a "normalized" project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and _. The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character.

But this is just for package names, not for extras. AFAIK there is no true spec for extras.

It would make sense to me to normalize extras the same way we normalize package names.

di avatar Dec 19 '18 18:12 di

Filenames for wheels follow a strict format defined here: https://www.python.org/dev/peps/pep-0427/#file-name-convention

merwok avatar Dec 19 '18 20:12 merwok

Ideally, the normalisation rules for both project names and extras would been defined in PEP 508: https://www.python.org/dev/peps/pep-0508/

Unfortunately, we were mainly focused on getting environment markers properly specified at the time, and missed that PEP 503 had the only spec for name normalisation, and that there isn't a specification for extras normalisation anywhere at all.

Given the normalisation rule in PEP 503, incorporating that into PEP 508 as well could be taken as a clarification rather than as a change - that would be @pfmoore's call.

Note that PEP 508 was complete enough at the time that https://packaging.python.org/specifications/dependency-specifiers/ is still just a link to it and doesn't have any content of its own.

ncoghlan avatar Dec 21 '18 10:12 ncoghlan

I'm inclined to treat adding the normalisation rules into PEP 508 as a clarification. My reasoning is that PEP 508 clearly states that project names and extras have the same syntax. Normalisation rules, while not syntax as such, are closely related and it's IMO reasonable to assume that two closely related items with the same syntax would have the same normalisation rules.

So I'd see this as a 2-part clarification:

  1. Copying the normalisation rules for project names from PEP 503 to PEP 508, simply to ensure that they are covered in the same place as the syntax.
  2. Clarifying that those normalisation rules apply to extras as well as project names.

If anyone knows of a real-life tool or use case that would be significantly inconvenienced by this clarification, we can reconsider, but I think that's unlikely.

If someone wants to create a PR for PEP 508 on this basis, that would be great.

pfmoore avatar Dec 21 '18 10:12 pfmoore

I regret the "all runs" language, but we do have to get rid of the dashes to be able to split the parts of wheel filenames. The dots seem excessive. The rule was based on setuptools behavior, which must be copied for standardized extras treatment.

dholth avatar Dec 21 '18 13:12 dholth

If someone wants to create a PR for PEP 508 on this basis, that would be great.

I'll try to get to it by the end of the year.

pradyunsg avatar Dec 24 '18 10:12 pradyunsg

The docs page that’s the living version of PEP 508 does not mention normalization: https://packaging.python.org/en/latest/specifications/dependency-specifiers/

This other page exists: https://packaging.python.org/en/latest/specifications/name-normalization/

This recent discussion on the forums feels similar: https://discuss.python.org/t/revisiting-distribution-name-normalization/1234 but it references PEP 503, not 508.

Status unclear!

merwok avatar Feb 11 '25 17:02 merwok

There's also PEP 625, which ratchets down the naming scheme for source distributions so that they match the normalization scheme used for wheels.

In effect, I think the rules for package name normalization are now driven by the Binary distribution format rules, which extend the "name normalization" rules to include "replace - with _".

woodruffw avatar Feb 11 '25 17:02 woodruffw