twine icon indicating copy to clipboard operation
twine copied to clipboard

Integrating `pypitoken` to restrict tokens

Open ewjoachim opened this issue 3 years ago • 3 comments

The ticket may be early-stage, but I'd love to start the discussion.

As you may or may not know, PyPI API tokens have this very cool property that the bearer is autonomous to create restricted versions of their token. For example, if someone uses a "user" pypi token, they can generate a token that only works for one specific project. Doing this before used to be a little complicated because the restriction format was not very well documented.

Enter pypitoken. It's a new library (disclaimer: I'm the author) that aims to be the source of truth regarding PyPI tokens. I've made a PR for Warehouse (pypi) to start using this library to generate and check the tokens, and I've launched a discussion for the lib to become an official PyPA tool.

Now, whether any of this happens or not, I believe we could already start seeing if we would be able to automatically restrict tokens in Twine, if it's something you folks would be interested in.

As of today, the only type of restriction supported by PyPI is projects restriction, but other types of restrictions are planned for the future (such as version number, filename and file hash).

The gain in security for twine would be mainly that tokens stolen in transit (MITM, ...) would be much less susceptible to be used to upload malicious distributions.

In terms of implementation, this is probably something we would want to kick-in only when the target is pypi (or testpypi) and the username in __token__. We could analyze existing restrictions on the token and add the relevant ones.

PS: I suggested this to Poetry too, for poetry publish.

ewjoachim avatar Mar 20 '21 22:03 ewjoachim

The gain in security for twine would be mainly that tokens stolen in transit (MITM, ...) would be much less susceptible to be used to upload malicious distributions.

Given how extremely unlikely it is for twine to be MITM'd. Does this deflate your whole article?

What you're proposing is presently completely orthogonal to how Twine operates. Twine picks up credentials the users have specified and does not generate them. We'd need to add either a new sub-command to generate for a given project a new token (via pypitoken - or perhaps through pymacaroons) or we'd need to do it for each upload, which doesn't seem to be giving the user much value.

This doesn't improve CI situations for users of twine, they still need to store a secret somewhere and generating a new token from that doesn't invalidate the fact that the root token still exists outside their control and could be compromised to upload arbitrarily to any package that token has access to.

I'll also say that the FUD above about being able to MITM twine talking to PyPI has already biased me heavily against this, so I'm trying to be fair in my questions.

sigmavirus24 avatar Mar 21 '21 11:03 sigmavirus24

Thanks for your answer. I'm a bit surprised that you read the part on the MITM as FUD, and I'm sorry that I phrased it this way.

I totally get your point, and at the same time, I read opposite opinions in the original ticket implementation:

I could even potentially see something like Twine just always mint a new macaroon scoped very specifically to what it is about to upload, with a very short expiration. Since that doesn't require talking to Warehouse to do, it would be very fast and very secure, allowing Twine to limit the capabilities of the token they actually send on the wire. While we generally trust TLS to protect these credentials, automatic scope limitation like that is basically zero cost and provides defense in depth so that in the case someone is able to look into the TLS stream (for example, a company MITM proxy) the credentials they get are practically useless once they've been used once.

(from @dstufft here)

In that ticket, it was also discussed the idea of implementing a 2FA caveat. PyPI would refuse uploads of packages using a token that has a 2FA caveat if a 2FA token isn't sent alongside. Twine would introspect the token and ask for the 2FA token when relevant. In this case, the security gain is totally different, but also interesting (to me).

Also, regarding CI, the ticket mentioned that at the time, Travis was interested into not exposing the full token to the CI script but adding restrictions beforehand. This would actually be a move that's totally independent from twine.

(via pypitoken - or perhaps through pymacaroons)

Yep, pypitoken is just a small layer over pymacaroon, but the main goal of this layer is to avoid reimplementing the macaroon caveats format everywhere, especially given it's treated as an implementation detail and not a part of an agreed specification. We could also totally make a PEP for this.

What you're proposing is presently completely orthogonal to how Twine operates.

Yes, and also not exactly how I see it. It all depends whether we want to include this in Twine's scope or not. I agree that we can do the same without twine being involved. The discussion really is on what we want to do more than what we can do. Twine is in a position where it sees the normalized project name, version, distribution filename, hash, and controls what token is sent, so it's only natural to explore the idea of having twine do the restriction operation.

I thought that given how the Twine argument had been discussed when deciding the PyPI token format, it was considered a good idea. Now if it's not the case, I'm perfectly happy to leave it as-is, and at least the discussion will have been explicit.

ewjoachim avatar Mar 21 '21 15:03 ewjoachim

Without diving into whether or not Twine should do this, I just wanted to note that it feels related to other items in our Repository configuration project.

bhrutledge avatar Mar 27 '21 13:03 bhrutledge