warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

`ultranormalize_name` function should strip trailing digits

Open samuelcolvin opened this issue 8 months ago • 4 comments

What's the problem this feature will solve?

Users are able to create packages with extremely similar names by just appending a number, e.g. pydantic2.

Describe the solution you'd like

I'd like the ultranormalize_name function

https://github.com/pypi/warehouse/blob/69f19a1bd3d198e266cfc2d1faf48908ffdda126/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42

to be extended to strip trailing digits when comparing names, so something like regexp_replace($1, '0+$', '').

Additional context

samuelcolvin avatar Mar 14 '25 09:03 samuelcolvin

@di any thoughts on this? It seems important to me.

samuelcolvin avatar Mar 24 '25 15:03 samuelcolvin

Bump. I think this is kind of important.

samuelcolvin avatar Apr 07 '25 20:04 samuelcolvin

I think a different approach might have to be considered, as although relatively uncommon, some organizations publish different PyPI packages on each major version bump. See for example elasticsearch:

  • https://pypi.org/project/elasticsearch/
  • https://pypi.org/project/elasticsearch8/
  • https://pypi.org/project/elasticsearch7/
  • https://pypi.org/project/elasticsearch6/
  • ...

Maybe creating packages with appended numbers should only be allowed if this is done by the owner of the original package name?

Viicos avatar Apr 08 '25 08:04 Viicos

Maybe creating packages with appended numbers should only be allowed if this is done by the owner of the original package name?

It seems somewhat common for people to do this when taking over maintenance of an abandoned project, which they don't have access to and the original maintainers have long disappeared. For users, this is a convenient way to find a still maintained version of the project they were looking for.

So, I'd suggest maybe only rejecting if the original name has had an upload in the last 2 years or something similar.

Dreamsorcerer avatar Apr 08 '25 13:04 Dreamsorcerer

@Dreamsorcerer I see where you're coming from, but that doesn't help us with cases like https://github.com/pypi/support/issues/6382 where they've already name-squatted pydantic2 and pydantic3.

Of course you can say we should have namesquatted this ourselves, but then:

  • how far do we go, should we be namesquatting pydantic42?
  • it's going to be an arms race and obviously we don't want every package squatting xyz0-9
  • if people started doing that, it would prevent the exact case you've pointed to

samuelcolvin avatar Jun 06 '25 00:06 samuelcolvin

@Dreamsorcerer I see where you're coming from, but that doesn't help us with cases like pypi/support#6382 where they've already name-squatted pydantic2 and pydantic3.

My only suggestion was to restrict the rule you're proposing to currently maintained projects. So, it would apply to pydantic, as this has received updates in the past year, but wouldn't apply to old, abandoned projects.

Dreamsorcerer avatar Jun 06 '25 12:06 Dreamsorcerer

oh I see, makes sense 👍 .

samuelcolvin avatar Jun 08 '25 09:06 samuelcolvin