warehouse
warehouse copied to clipboard
`ultranormalize_name` function should strip trailing digits
What's the problem this feature will solve?
Users are able to create packages with extremely similar names by just appending a number, e.g. pydantic2.
Describe the solution you'd like
I'd like the ultranormalize_name function
https://github.com/pypi/warehouse/blob/69f19a1bd3d198e266cfc2d1faf48908ffdda126/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
to be extended to strip trailing digits when comparing names, so something like regexp_replace($1, '0+$', '').
Additional context
@di any thoughts on this? It seems important to me.
Bump. I think this is kind of important.
I think a different approach might have to be considered, as although relatively uncommon, some organizations publish different PyPI packages on each major version bump. See for example elasticsearch:
- https://pypi.org/project/elasticsearch/
- https://pypi.org/project/elasticsearch8/
- https://pypi.org/project/elasticsearch7/
- https://pypi.org/project/elasticsearch6/
- ...
Maybe creating packages with appended numbers should only be allowed if this is done by the owner of the original package name?
Maybe creating packages with appended numbers should only be allowed if this is done by the owner of the original package name?
It seems somewhat common for people to do this when taking over maintenance of an abandoned project, which they don't have access to and the original maintainers have long disappeared. For users, this is a convenient way to find a still maintained version of the project they were looking for.
So, I'd suggest maybe only rejecting if the original name has had an upload in the last 2 years or something similar.
@Dreamsorcerer I see where you're coming from, but that doesn't help us with cases like https://github.com/pypi/support/issues/6382 where they've already name-squatted pydantic2 and pydantic3.
Of course you can say we should have namesquatted this ourselves, but then:
- how far do we go, should we be namesquatting
pydantic42? - it's going to be an arms race and obviously we don't want every package squatting
xyz0-9 - if people started doing that, it would prevent the exact case you've pointed to
@Dreamsorcerer I see where you're coming from, but that doesn't help us with cases like pypi/support#6382 where they've already name-squatted
pydantic2andpydantic3.
My only suggestion was to restrict the rule you're proposing to currently maintained projects. So, it would apply to pydantic, as this has received updates in the past year, but wouldn't apply to old, abandoned projects.
oh I see, makes sense 👍 .