Add scenarios for key discovery and prioritized registries.
I added some scenarios based on the discussion during the last meeting.
Updates to scenario 11 LGTM. It does a good job focusing on the goal without the implementation (e.g. we don't say the chain must be stored in the registry).
As discussed, I moved the first scenario to #96, and added a discussion of scoping to the scenario in this pr.
The example may benefit from a diagram of possible uses of multiple roots, but I'm not sure if that fits well in this document or elsewhere? I'll work on creating something either way.
cc @sudo-bmitch @gokarnm @mtrmac
Revisiting the priority ordering discussion, it might help for me to write out my scenario for why I don't see the value of the priorities and perhaps @mnm678 can describe one where it is needed.
Lets assume 2 clusters, dev and prod. In dev, I require one key from the following scoped roots:
- Organization root that is scoped to the world. Anything they sign, we trust.
- Dev root, also scoped to the world.
- Docker Library root, scoped to a mirror of Docker official images.
- Wordpress root, scoped to a local mirror of the wordpress repo.
And prod would not include that dev root, but is otherwise the same:
- Organization root that is scoped to the world. Anything they sign, we trust.
- Docker Library root, scoped to a mirror of Docker official images.
- Wordpress root, scoped to a local mirror of the wordpress repo.
For developers building stuff in CI, they get a key under the dev root to sign all their local work and run anything they want. And they can run anything the parent organization has signed. If they try to run an unmodified wordpress image from their local mirror, they really don't care which of the three roots has signed the image (organization, dev, or wordpress), each are equally valid for that image in my mind. That gives them flexibility to extend the wordpress image to fix some bug and push it to the same repo with a different tag and still deploy it using their dev key. However if the wordpress key is ever the only key signing the mirror of library/alpine, the policy would reject that since the wordpress key isn't scoped for that repo.
Similarly in production, if they verify an image that has a dev signature attached, that signature is ignored and they search for one from the organization (or one of the other trusted keys in their defined scope). Seeing the dev key, or lack there of, doesn't impact the approval, they just continue searching until a key that does satisfy the policy is found. And similar to dev, if they deploy the alpine image, it doesn't matter if they find the signature for the Organization or the Docker Library, both would be accepted by the policy, so I can't come up with a reason where priority would matter.
Note that Notary v2 is only verifying that a given image with a digest, and possibly the tag, is signed by a trusted entity. When Notary is called, the admission controller has already been told which image, and images are referred to with an explicit registry name, so we've been fortunate to not be subject to the dependency confusion attacks (with the downside that people hitting Docker Hub rate limits have to modify their deployment to point to a different registry hostname to use a mirror).
My question is only on the priority part, so if I'm missing something, please let me know. Other than that, I'd like to see the rest of this approved since scoping of keys is important to me, and I believe to many others too.
@sudo-bmitch Thanks for the example, I think this makes it more clear what we're talking about.
In this scenario, is it possible for different trusted parties, say dev and wordpress to sign different artifacts for the same tag? This is the case where priority is important, so that the user can have a deterministic resolution of the verification. However, I think you're saying that the tag resolution is out of scope? If so, I guess this problem would pass to the admission controller, which would then be in charge of resolving the priority.
@mnm678 at any one time, a tag will only point to a single digest from the registry. (It gets more complicated than that with multi-platform images, but I don't think that's a factor for this.) The tag is typically mutable, so another manifest could be pushed to replace the tag, e.g. the dev team extends the upstream image and pushes their own version of of the wordpress:latest to their mirror, but that replaces the tag rather than giving multiple things the tag could return. From the perspective of the verifier, they'll only see a single digest when they check the tag, possibly signed multiple times. They'd then query the registry to see if that digest (and possibly the associated tag) is valid according to the policy.
I need to think a lot more on whether there's a clean way to inject notary between the tag to digest resolution process, and whether we want it to. Docker Content Trust did this with nv1, but that worked because it was tightly integrated with the client doing the pulling, and there was a secondary database of tag to digests maintained by the notary server. We're actually seeing an issue from that now because signing of official images hasn't happened since late 2020. So people with nv1 enabled are getting stale images from last year instead of more current/patched images, and there's no UX to tell them that's happening.
At least right now with the current design, we've ditched the external notary server, and left the tag to digest resolution happening with the registry, so we'll only ever get one value from that, which means there's nothing to prioritize.
In that case, I think we'll need to update some of the requirements and scenarios to make it clear that Notary will no longer support tag signing. However, I worry that removing that will make it impossible to ensure many of the security guarantees that Notary is aiming for, including signature revocation and protection from rollback/freeze attacks.
But if Notary is just checking the signatures on a hash, then I agree that the priority attacks wouldn't apply.
At least with the next release, I don't think there will be any tag signing guarantees (which makes me push back if we try to call it GA). When we do get tag signing, I suspect it will be a much weaker guarantee than we have with nv1. When we do get tag signing, I believe it will be on the other side of the request, verifying that the tag is valid for a digest, rather than asking a notary server what digest should be pulled for a tag.
A potential implementation could have the signature include the descriptor that has the digest, but also add an annotation with an array of tags that the signer claims may be valid for that digest. It means multiple digests could all be valid for a common tag, unless those old signatures are revoked as new images are built. That may be a desirable quality for someone deploying with a pinned digest, or redeploying an image they pulled some time in the past and a scaling event was triggered.
There are other implementations that could improve the integrity (only one tag to digest mapping is valid), at the cost of availability (e.g. an ill timed scaling event finds the previously downloaded version of the image is no longer valid and a download is triggered, delaying the start of the container to handle the flood of requests). They each have tradeoffs, and I think it would be a good question to put to the community.
Independent of how tag signing is implemented, I think a freshness guarantee of the tag signing makes sense with registries, while priorities just doesn't fit our model, because we're not prioritizing pulling images from different registries,.
And similar to dev, if they deploy the alpine image, it doesn't matter if they find the signature for the Organization or the Docker Library, both would be accepted by the policy, so I can't come up with a reason where priority would matter.
Consider an important vendor hosting their official images at docker.io/vendor; the consumer has a generic policy to require docker.io-hosted images to be signed with a Docker “correctly uploaded by a valid docker.io user” key, but the consumer also has a direct relationship with that vendor, and knows the right public key for that vendor. The consumer does want to accept only the known vendor’s key, not either of the vendor’s key and the generic docker.io key.
This is possible to express as “vendor’s key scoped to docker.io/vendor + Docker key scoped to docker.io” if scopes are exclusive, and the docker.io scope doesn’t apply to docker.io/vendor; if all matching scopes are merged and treated as equivalently trusted, we would need priorities — but I’d prefer to have exclusive scopes, because that allows much simpler analysis of the configuration.
When we do get tag signing, I suspect it will be a much weaker guarantee than we have with nv1. When we do get tag signing, I believe it will be on the other side of the request, verifying that the tag is valid for a digest, rather than asking a notary server what digest should be pulled for a tag.
A potential implementation could have the signature include the descriptor that has the digest, but also add an annotation with an array of tags that the signer claims may be valid for that digest. It means multiple digests could all be valid for a common tag, unless those old signatures are revoked as new images are built. That may be a desirable quality for someone deploying with a pinned digest, or redeploying an image they pulled some time in the past and a scaling event was triggered.
Yes, I think this is the right trade-off. The TUF freshness/rollback-protection guarantees are just too costly to operate (signing is not an one-time deployment action but a product that needs to continuously run and re-sign freshness guarantees to avoid downtime), and only really relevant for users deploying :latest or similar moving tags, which is generally problematic in enterprise test/production deployments for many reasons anyway.
If version tags are not moved, and the image signer institutes a policy of never signing two different digests with the same version tag, we don’t need the freshness/rollback-protection guarantees and we can have a much simpler design.
@mnm678 Would you mind closing this PR since no activities for more than 1 year? You can create a new issue to describe the problem if needed. Thanks.