warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Improve GitLab projects name verification

Open DarkaMaul opened this issue 1 year ago • 7 comments

This PR addresses https://github.com/pypi/warehouse/issues/15852

Notably, it prevents names from :

  • being in the reserved GitLab name list
  • ending with .atom or .git
  • ending with something else than a-zA-Z0-9
  • consecutive special characters

The keywords list are extracted from GitLab code. As noted by @facutuesca , this is mostly a cosmetic change, so having unsynchronized lists should not introduce a security risk.

Of note, we could slightly improve the regexes by limiting the number of potential matches to a fixed upper bound and prevent ReDos types of attacks.

DarkaMaul avatar Jul 11 '24 11:07 DarkaMaul

Thanks @DarkaMaul! I want to pare this back a bit to avoid depending on assumptions internal to GitLab's code, but the consecutive character check makes a lot of sense to me to add.

woodruffw avatar Jul 11 '24 15:07 woodruffw

Is this ready for review?

di avatar Jul 19 '24 04:07 di

I think we are still unsure if we want to introduce a dependency on Gitlab internal code (for the list of forbidden project names).

I can refactor the PR to only include the following rules :

  • no consecutive special characters
  • no ending with a special character

We can leave the two others out of this PR:

  • not a reserved name
  • does not ends with .git or .atom

/cc @woodruffw @facutuesca

DarkaMaul avatar Jul 19 '24 10:07 DarkaMaul

IMO having those lists of forbidden names is fine, since they are documented on GitLab's docs. For me the important question is if having such exhaustive checking is worth it. @di what do you think?

facutuesca avatar Jul 19 '24 11:07 facutuesca

The purpose of these validators is to prevent likely typos or misunderstandings about what the value of this field should be.

I think it's fairly unlikely that a user would frequently be trying to use these forbidden names, and even if they were, it would be fairly clear why it was failing.

TL;DR: I don't think being so exhaustive is necessary or worth it.

di avatar Jul 19 '24 11:07 di

does not ends with .git or .atom

I think this could happen frequently enough that it's worth including.

di avatar Jul 19 '24 11:07 di

does not ends with .git or .atom

I think this could happen frequently enough that it's worth including.

Agreed! Let's cover those two suffixes but leave the reserved names out.

woodruffw avatar Jul 19 '24 14:07 woodruffw