rfcs
rfcs copied to clipboard
crates.io token scopes
This RFC proposes implementing scopes for crates.io tokens, allowing users to choose which endpoints the token is allowed to call and which crates it's allowed to affect.
For @rust-bus I would love to have ability to add a co-owner of a crate who does not have permission to kick out other owners. This is how crates-io currently makes github team accounts behave, but teams cannot be invited (and conversely it would also make sense for a github team to be able to fully own a crate with all permissions).
Tying change-owners permission tied only to a token is insufficient for the @rust-bus-owner case, because people co-own crates using accounts, not tokens.
For @rust-bus I would love to have ability to add a co-owner of a crate who does not have permission to kick out other owners. This is how crates-io currently makes github team accounts behave, but teams cannot be invited (and conversely it would also make sense for a github team to be able to fully own a crate with all permissions).
Tying change-owners permission tied only to a token is insufficient for the @rust-bus-owner case, because people co-own crates using accounts, not tokens.
@kornelski I feel like user permissions are out of scope for this RFC: my goal here is mostly to improve the security for automated processes or CI environments.
I don't think there will be conflics between token scopes and user roles/permissions, as token scopes are an additional restriction on top of the normal authorization process. With this RFC I will be able to create a token with the crates scope of serde
, but that doesn't mean I will be able to actually publish that crate unless I'm invited as an owner.
For CI publishing, adding limits to tokens reduces damage they can do, but I still don't feel safe about using them in CI, because scopes doesn't do anything to prevent tokens from being stolen and misused within their scope.
There's no downside to having scopes as an option, but I feel they're not enough for making CI secure. If you were to add only one thing, I'd prefer a second factor auth for publishing, rather than scopes.
For example, I'd like ability to only upload a new package from CI, which would not be published until I confirm that in some more secure way. It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public. That would give a chance to prevent damage from stolen token (not just limit scope of the damage), and make misuse visible.
Should there be separate scopes for publishing a new crate and a version of an existing crate, instead of the single publish scope?
Yes, please. I'd like to have that separation, as a self-protection measure.
@lu-zero @joshtriplett sounds good, changed the text to split publish
in publish-new
and publish-existing
.
I second the sentiment that we will need a permissions model for crate owners as well in the long run. While I appreciate that this model is out of scope for this RFC, I think it would be worthwhile to think about how the token scopes can be generalized to user permissions in the long run, to avoid having two different models for closely related purposes, and to reduce the overall implementation effort and codebase complexity in crates.io. I don't see any problem with the approach in this regard, so I'm not asking for any specific changes.
It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public.
This would definitely be great to have. I think it could be implemented as an extension to the scopes model in this RFC, by simply adding a boolean flag that any action taken via this token requires email confirmation. Maybe we leave this as a follow-up as well, rather than including it in this RFC?
@kornelski
For CI publishing, adding limits to tokens reduces damage they can do, but I still don't feel safe about using them in CI, because scopes doesn't do anything to prevent tokens from being stolen and misused within their scope.
There's no downside to having scopes as an option, but I feel they're not enough for making CI secure. If you were to add only one thing, I'd prefer a second factor auth for publishing, rather than scopes.
Of course scopes aren't the only thing that will improve the security of the crates.io ecosystem, but they will considerably improve the security for large projects and organizations, and I think are worth pursuing in the near term. For example, in the rust-lang org we have a bunch of repositories and crates we might want to setup publish automation for, but to get a token we have to either:
- Create a GitHub account for each token we need to create, and add that GitHub account as owner. This has the downsides of allowing a compromise to transfer the crate away (instead of just publishing bad versions we can yank), and you might run into GitHub's abuse detection systems if you create a bunch of accounts.
- Use the token of a project member's personal account, which will become a problem when the person leaves the project and extends a possible compromise to crates unrelated to the org.
With token scopes we could have a central bot account owner of all the rust-lang crates, who also owns all the tokens used around the organization. This would reduce maintenance and make security response easier in case of a breach (as we don't have to chase who owns which token).
For example, I'd like ability to only upload a new package from CI, which would not be published until I confirm that in some more secure way. It could be as simple as sending an e-mail to crate owners with a confirmation link that has to be clicked to make the new version public. That would give a chance to prevent damage from stolen token (not just limit scope of the damage), and make misuse visible.
There is already work underway to send email notifications every time a new version is published: the backend and frontend is done, and the only thing left is actually sending the emails. While this is not exactly what you propose, it would still allow crate authors to notice misuses and revoke the tokens / yank the crates.
Second factor for publishes might be implemented on top of that feature, and I don't think it's worth blocking this RFC on it.
Great proposal. I think defining token scopes more tightly will be really valuable. A couple of thoughts:
It seems like the main use case is defining low-privilege tokens for use in CI. It seems like those tokens will only want "publish-update." Can the proposal be simplified into two scopes: "publish-update" and "manage" (which would include what's currently called "publish-new", "yank", and "change-owners")? It's much easier to add more granular scopes later than it is to remove scopes if they turn out to be non-useful.
Instead of treating the regex-ish minilanguage as part of the security properties, I would recommend modelling the relationship between token and crates explicitly - for instance with a 1:many table of token IDs to crate IDs. Using regex to enforce security properties is often a source of bugs, and so are ad-hoc syntaxes. I think it's possible to achieve similar ergonomics wins by implementing a pattern matcher in the JS frontend, and using that to quickly select groups of crates. The disadvantage would be that a user who creates a new crate after creating their scope-limited token would need to explicitly add that crate to the token; but I think that is actually a bit of an advantage, in that it makes the permission explicit. And if it turns out to be highly burdensome for users, it's possible to add automation after the fact that achieves the same goal.
It seems like the main use case is defining low-privilege tokens for use in CI. It seems like those tokens will only want "publish-update." Can the proposal be simplified into two scopes: "publish-update" and "manage" (which would include what's currently called "publish-new", "yank", and "change-owners")? It's much easier to add more granular scopes later than it is to remove scopes if they turn out to be non-useful.
I think publish-new
is also something that would be useful in CI, especially for big projects or monorepos, otherwise when a new crate is added a person would have to publish it manually. As others pointed out it's useful to split publish-new
and publish-update
as not everyone wants this feature, but I can see it being used (examples of such projects are rustc-auto-publish which will need to publish new crates when they're added to rustc, or rusoto which contains more than 100 auto-generated crates).
I agree with you that yank
and change-owners
are way less likely to be used on CI, and I could see them being merged into a manage
scope. I'm wondering though if from a security point of view it makes sense to merge them: yanking is a reversible action and it can be easily rolled back if a malicious actor took control over the token, but changing ownership will led to you losing control over the crate if someone abused the token.
I could see myself using a token without the change-owners
scope on my workstation, and generating-then-revoking a temporary one if I happened to transfer a crate.
Instead of treating the regex-ish minilanguage as part of the security properties, I would recommend modelling the relationship between token and crates explicitly - for instance with a 1:many table of token IDs to crate IDs. Using regex to enforce security properties is often a source of bugs, and so are ad-hoc syntaxes. I think it's possible to achieve similar ergonomics wins by implementing a pattern matcher in the JS frontend, and using that to quickly select groups of crates. The disadvantage would be that a user who creates a new crate after creating their scope-limited token would need to explicitly add that crate to the token; but I think that is actually a bit of an advantage, in that it makes the permission explicit. And if it turns out to be highly burdensome for users, it's possible to add automation after the fact that achieves the same goal.
Yep, that was the other possible implementation I thought of for this. Your proposal makes complete sense, but it's indeed burdersome for large projects. The thing that prompted me to write this RFC was creating a token to allow the rust-lang/chalk repo to publish their multiple crates from CI: being able to just allow chalk-*
would be way better than having the Chalk developers ping someone on infra every time they add a new crate.
I'm wondering if a middle ground could be to still have users input the pattern, but also have a off-by-default "Allow future crates matching this pattern" checkbox. If the box isn't checked the backend will resolve the pattern and replace it with the list of matched crates without any wildcard. For example, chalk-*
would result in the following pattern if the box was not checked:
chalk-derive,chalk-engine,chalk-ir,chalk-recursive,chalk-solve
Thinking about this some more, it feels like this can be simplified further. The proposal aims to express two different, but related, concepts:
- Authorization for a specific subset of crates, and
- Authorization for a specific subset of actions.
If we can split up those concepts, I believe we'll wind up with a solution that is simpler and more orthogonal.
How about using teams to express (1)?
Proposal: Members of a team should be able to create a "team API token" that is authorized for the set of crates that team is authorized for. Additionally, for (2), all tokens (user and team) should have a scope that allows "manage" or "publish" access, but tokens are not scoped by crate name - that role would be played by teams.
This has a few advantages. First, crates already has the notion of grouping packages by authorization (teams), so we wouldn't need to develop a parallel system for managing tokens. Also, the system for teams allows adding and removing crates from a team's ownership, which solves the "allow future crates" issue we've been discussing. As crates are added to a team, that team's token would automatically have authorization for them.
Second, this allows management of the tokens as a team. If there is a compromise of a token in CI, anyone on the team could revoke the team token, rather than having to wake up whoever created the CI token on their individual user.
Third, this makes it easier to avoid a potentially subtle CI security problem. In both GitHub and Travis CI, anyone with write access to the repo can read out the secrets. For a single repo, that's no big deal. Someone with write access could just use that access to publish a malicious version anyhow. However, if crates A and B are part of the same project but are developed in separate repos, these can be out of sync. Imagine that crates A and B are using the same API token to publish from CI. Someone who has write access on crates A's repo can extract the token from CI and use it to publish new versions of crate B. Under my proposal, we can give straightforward advice to avoid this problem: "When using a team token to publish from CI, make sure only that team has write access to the relevant repos."
Some currently-existing teams might not express the exact set of crate ownerships that a given group of crates wants, but team creation is free.
How do this work for an individual user who wants to publish from CI, and also silo off groups of their personally-maintained crates? They can create an organization in GitHub (also free) and then make teams under that organization.
I agree with you that yank and change-owners are way less likely to be used on CI, and I could see them being merged into a manage scope. I'm wondering though if from a security point of view it makes sense to merge them: yanking is a reversible action and it can be easily rolled back if a malicious actor took control over the token
This is a good point! However, splitting these into separate scopes only makes sense if you expect people to generate low privilege yank
-only tokens. I can't think of a scenario where that's useful.
being able to just allow chalk-* would be way better than having the Chalk developers ping someone on infra every time they add a new crate.
I didn't fully understand this sentence. Why do Chalk developers need to ping the infra team when they add a new crate?
I could see myself using a token without the change-owners scope on my workstation, and generating-then-revoking a temporary one if I happened to transfer a crate.
Alternately you could use the "Manage Owners" UI on the website version for this purpose. If the website gained UI for yank
/ unyank
, people could then use publish-only tokens on their workstations and wouldn't need such fine-grained scopes.
This is a good point! However, splitting these into separate scopes only makes sense if you expect people to generate low privilege yank-only tokens. I can't think of a scenario where that's useful.
I would be pretty annoyed if yanking wouldn't be separate, because then I couldn't give out full "manage releases" permissions on a crate without also potentially giving up ownership on that crate. Being able to give out (temporary) release + release management rights is something I'd love to be able to do natively here, instead of proxying actions through a "bot" that manages those fine-grained permissions.
I would be pretty annoyed if yanking wouldn't be separate, because then I couldn't give out full "manage releases" permissions on a crate without also potentially giving up ownership on that crate.
This is well said, and it echoes a request @kornelski had up-thread:
For @rust-bus I would love to have ability to add a co-owner of a crate who does not have permission to kick out other owners.
It seems there's a high demand for this type of ability, and one can mostly achieve it with token scopes as currently proposed, but it might be more effective, as @pietroalbini said, to address that as part of a "user roles" change. A disadvantage of using token scopes to express the co-owner relationship: Someone who co-owns a crate by virtue of holding a token that you generated for them isn't listed on the crate owners page. Relatedly: such an co-owner should be able to self-revoke the token in case they accidentally disclose it. This is actually possible via the API today, if the co-owner happens to know the token id, but it might be good to offer a way to do it without knowing the token id:
curl -X DELETE https://crates.io/api/v1/me/tokens/:id -H 'Authorization: cioAbCdEfGhiJKLmnop'
All that said, I see how a separate yank
scope adds a valuable ability now, without having to wait for a more complex "user roles" change. And the additional complexity of a separate yank
scope is fairly small.
To add on and contrast with kornel's usecase as in the above comment, I specifically would not want "releasers" to automatically have "ownership" — these are two different concepts, and best left to the (human) organisation to figure out rather than tied together at the api level
Even if you don't call it "ownership", I think it's very important to publicly disclose who can publish new code for a crate.
This is important for evaluation of risk and trust. I may trust people listed as owners on crates-io, but actually not trust some other person who has been given a token. I realize technically anyone could give their login token to anyone else, but that isn't considered a sensible practice. If scoped tokens are considered safe to give to other people, then that makes crate "owners" officially an incomplete set of people who can publish.
So I would prefer a user-bound visible "releaser" role for giving limited access to other people, and scoped tokens should remain in control of the person who created them.
That's an interesting proposal! I still kinda prefer some kind of pattern, but I feel like using teams could work. There are some issues that would need to be ironed out first:
- While crates.io only supports authentication with GitHub right now, there is an open issue to implement support for other OAuth providers. I'm a bit worried this approach would either lock us into only supporting OAuth providers supporting some kinds of teams, or force users to create a GitHub account even if they log with other OAuth providers.
- I think this clashes with the current implementation of crate ownership: when a crate is published with a team token, who's going to be the owner of it? If it's only going to be owned by the team then nobody will ever be able to add new owners to it, as team members can't change the owners.
- For the rust-lang organization specifically this would be kinda of an hassle to manage: if we create a token for CI we'd like to only allow CI to push, not the individual team members: avoiding granting access to the whole team would decrease the likelyhood of someone publishing a new version from their local computer by mistake. So, in practice this would mean creating a bunch of dummy teams with just our bot account in it to properly scope crates, which is kinda annoying.
I didn't fully understand this sentence. Why do Chalk developers need to ping the infra team when they add a new crate?
Chalk's CI currently uses a token from a bot account controlled by the infra team to publish releases. If we don't implement wildcards they'll have to ping the infra team every time they add a new crate, asking us to log into the bot account and manually authorize the new crate in the token.
Relatedly: such an co-owner should be able to self-revoke the token in case they accidentally disclose it. This is actually possible via the API today, if the co-owner happens to know the token id, but it might be good to offer a way to do it without knowing the token id
That's a great idea, and I could see DELETE /me/tokens/current
being implemented. I don't think it requires a full RFC, maybe open an issue on the crates.io repo for it? (please cc me there)
While crates.io only supports authentication with GitHub right now, there is an open issue to implement support for other OAuth providers. I'm a bit worried this approach would either lock us into only supporting OAuth providers supporting some kinds of teams, or force users to create a GitHub account even if they log with other OAuth providers.
I don't think this would lock us into GitHub any more than the current Teams feature does. For users that need this particular feature (tokens restricted to a set of crates), they would need to be using an OAuth provider that supports teams, but I think that's an acceptable tradeoff for the increased expressiveness we get by taking advantage of teams.
I think this clashes with the current implementation of crate ownership: when a crate is published with a team token, who's going to be the owner of it? If it's only going to be owned by the team then nobody will ever be able to add new owners to it, as team members can't change the owners.
Good point. Two possibilities:
- The metadata for a new crate publish request includes an "initial_owner" field. For crates published by team tokens, this has to be a member of the team. Or:
- When a team member adds a crate to a project for the first time, they do the first publish from their workstation (this fits in with the manual work of creating the Cargo.toml etc), and then immediately add the team as an owner. They can do this without pinging infra because they are a member of the team themselves. Once the team is an owner, the team token automatically has authorization to publish that crate.
if we create a token for CI we'd like to only allow CI to push, not the individual team members:
Ah, I hadn't fully absorbed your previous comment:
With token scopes we could have a central bot account owner of all the rust-lang crates
In the RFC you say 'Finally an alternative could be to do nothing, and encourage users to create "machine accounts" for each set of crates they own.', so I was thinking that the goal was to reduce dependence on bot accounts.
It seems like this is an additional semantic we'd like to express, so adding it alongside the others:
- This token is only authorized to act on these crates.
- This token is only authorized to perform these actions.
- This crate may only be published by these (tokens | users).
#3 seems valuable! It sounds like the current plan is to express it via bot accounts, but it would be nice to be able to express it more directly. For instance, maybe a crate could have a "team tokens only" setting? I feel like there's got to be a nice clean way to express this but it's not coming to me at the moment.
I don't think this would lock us into GitHub any more than the current Teams feature does. For users that need this particular feature (tokens restricted to a set of crates), they would need to be using an OAuth provider that supports teams, but I think that's an acceptable tradeoff for the increased expressiveness we get by taking advantage of teams.
My worry is that we'd rely on it to provide security rather than convenience (not having to manually add team members on crates.io). I'm kinda ok requiring some OAuth providers to get all the quality-of-life improvements, but security features should be available to everyone.
It seems like this is an additional semantic we'd like to express, so adding it alongside the others:
- This token is only authorized to act on these crates.
- This token is only authorized to perform these actions.
- This crate may only be published by these (tokens | users).
#3
seems valuable! It sounds like the current plan is to express it via bot accounts, but it would be nice to be able to express it more directly. For instance, maybe a crate could have a "team tokens only" setting? I feel like there's got to be a nice clean way to express this but it's not coming to me at the moment.
I think the way to address #3
is to rework the ownership system with "roles" assignable to users and teams, which would allow a team to have full privileges and potentially be the sole owner of the crate.
In the RFC you say 'Finally an alternative could be to do nothing, and encourage users to create "machine accounts" for each set of crates they own.', so I was thinking that the goal was to reduce dependence on bot accounts.
Yes! With this RFC we'd go from having a bot account for each project we want to scope, to a single bot account with tokens scoped for each project. That bot would be controlled by a subset of the infra team.
I can definitely see how the next step would be to change crates.io to treat teams like people, allowing us to give ownership to the infra-admins
team and generate scoped team tokens for it. I feel like it's another big change, potentially bigger than this RFC, so I think it can be addressed in a future RFC.
Note: I'm currently really busy with other stuff, I hope to address all feedback in the coming weeks.
Ok, I believe I addressed all the feedback:
- Fixed an error with the regex (@8573)
- Added an explicit
legacy
scope that grants access to all the endpoints except for creating a new token (@jtgeibel) - Clarified that only non-alphanumeric chars will be quoted in crate scopes (@jtgeibel)
- Added "tokens owned by a team" as a future possibility (@jsha)
Thanks for adding the note about tokens owned by a team. I had another idea for how to avoid the need for regexes, while still serving the use case of "token automatically has access to certain newly-created crates." Crate scopes would be represented by an explicit one-to-many table mapping token id to crate id. When a scoped token publishes a new crate, that crate automatically gets added to the token's scope.
I think this maps well to the underlying concept being expressed: One repository often has multiple crates in it, but usually the CI for that repository will share one API token. If CI is responsible for publishing new crates as they are added to the repository, the scope of that API token automatically expands as the repository expands.
I don't see how that works for tokens scoped to non-new-publish permissions, if those are a thing (as of this rfc or in the future)
Also that means publish-new-crate tokens can now publish any name, instead of being limited to whatever the associated pattern is.
I don't see how that works for tokens scoped to non-new-publish permissions, if those are a thing (as of this rfc or in the future)
Correct, I think this makes sense for tokens that can publish new crates. That seems to match the desired use case for "token automatically has access to certain newly-created crates:" when a new crate is added to a repository, CI automatically publishes it. The situation where this would not apply is where someone manually publishes the first version of a crate, and wants their CI to automatically publish subsequent versions, without modifying the scope of the CI token.
Also that means publish-new-crate tokens can now publish any name, instead of being limited to whatever the associated pattern is.
This is true. Above we discussed how some actions are low-risk because they don't cause immediate problems and because they are easily reversible. For instance, "yank" is easily reversible. Similarly, publishing a new crate with an unauthorized name is low-risk (because no one will depend on it right away), and is easily reversible by yanking it. If you think of it from the perspective of an attacker who has stolen a token, they would much rather publish a malicious version of a crate that already exists than publish a new crate with an inappropriate name. So, IMO, it is not worth spending complexity to further constrain the publish-new scope to "publish-new only with names matching this pattern."
Alright, good points, I can see the logic in them.
Should we fcp this?
Sorry for showing up late to the party here! Credential design is very much something I'm interested in, and proposed a feature along these lines at the Bay Area Rust Meetup in 2017.
My original proposal was based on Macaroons, but these days I'd strongly consider looking at @geal's biscuit
format which is close to hitting 1.0 and whose primary implementation is in Rust.
Biscuits support this sort of attribute-based access control, but also support attenuation/offline delegation, and even have a Datalog-like language for specifying floating access control policies. So they could support everything proposed in this RFC as well as significantly more flexibility, expressive power, and use cases.
Biscuit would indeed fit well there. You can use server side authorization policies to define the basic behaviour (ownership, teams, etc) and then create tokens with either general rights like "this token has this user's rights", or restricted ones like "this token can only publish an update for this specific crate". These restrictions can be encoded as datalog queries and can be quite flexible, depending on multiple conditions, like expiration dates, origin IP, etc. A token can then be attenuated offline, which means that a user can create a new, valid token from an existing one, by restricting its rights. This would allow users to encode more complex policies for their CI, that would be hard to support in the crates.io backend or UX. I can help in designing the rights system and providing tooling around tokens.
Separately, it'd be good to know the plan for how this feature would interact with stateful sessions were they to be implemented: https://github.com/rust-lang/crates.io/issues/2630
Cross linking the suggestion https://github.com/rust-lang/rfcs/pull/3231#issuecomment-1069587274. It suggests token come in the Biscuit format. This will allow a user to attenuate a token to any arbitrarily small set of actions. However this will happen entirely on the client-side, if I understand it correctly server-side guaranteed scopes would need to be done separately. One question I have for people following this discussion is, do we need server-side scoping if users can create their own client-side attenuated tokens?
It could be a consideration to have some kind of scoping or restriction being able to be set on submitted public keys, but as it stands (without this RFC) a public key linked to an unattenuated Biscuit token would be no more powerful than a current API token. I see little benefit to implementing server side scoping on top of that: it would need to be done by restricting what a given submitted public key could do; I just don't see how it brings any useful property to the table, so it would only be added complexity.
With Biscuit, your RFC would imo entirely supersede this one.