Allow registries to reject non-existent subjects in manifests
During conformance testing it was found that registries which require strong references between manifests and blobs fail conformance due to MUST language in the spec requiring acceptance of a manifest referencing a non-existent subject manifest. While subject fields may be described as a weak reference, listing and querying them at large scale may require a strong reference (such as foreign key in a database) or may simply be inheriting the data model used in 1.0 which always had referenced objects (as viewed from the merkle DAG) uploaded first.
The arguments for MUST language was to (1) support registries which may have reference only repositories, storing content elsewhere, and (2) ensure referrers exist at manifest pull time since there is no atomic way to upload referrers with manifests.
For (1) the burden will be on the client to handle this case on upload, as a registry is not required to support such repositories.
For (2) clients can retry or check for freshness when validation is a requirement or clients can ensure tags are only updated once all content is available. Similar issues have occurred in the past with multi-platform images. If images were uploaded before all platforms were available, then clients could see a race condition between the platform they need being built and the image they pull having that platform available. The same solution could apply here, use push by digest or a temporary tag when pushing manifests that should not be considered fully available and "tag" it once complete (via upload of the manifest using tag reference).
Changing the language the MUST to MAY makes most sense here. Additionally we can add guidance in the spec on how to perform manifest uploads more transactionally. In the future we could consider a more explicit way to create and manage transactions.
Related to https://github.com/opencontainers/distribution-spec/issues/340 https://github.com/opencontainers/distribution-spec/pull/341
If clients treat the referrers to a manifest the same as other descriptors in the manifest, then it makes sense to push all content for a given manifest, including referrers, before the manifest itself. It's the client equivalent of a registry ensuring all manifests exists before allowing the index to be pushed. This helps with various client side issues including:
- Ensuring metadata (like signatures) are available when the manifest is pushed for end users (eliminating failures in admission controllers, and the associated logs/alerts).
- Reduce workloads on some mirroring solutions that want to cache the referrers response. If they know that upstream will always push a complete image and metadata, they can have longer cache refresh times.
- Allowing registries to enforce policies on their content (e.g. a policy requiring all images in a repository to be signed).
- More efficient repository synchronization tooling. They can skip the deep inspection of a manifest when the target manifest exists. If a synchronization can be interrupted with the manifest pushed but not the referrers, then every manifest copy needs to check every child manifest and referrer to ensure all referrers exist, and for efficiency they'll want to cache that between polling cycles for every repo/manifest being synchronized.
I don't think clients would implement both workflows if we change the spec to say MAY, doing a full recursive copy with concurrency is complex enough without having part of the copy sometimes moving after a wait lock. So this change would result in clients all switching to copying the manifest before the referrers.
Even if clients defer tagging the manifest, pushing by digest, pushing all referrers, and then pushing the tag, I think scenarios 2, 3, and 4 from the list above are still impacted.
One of the questions that came up when we discussed this before is whether it's appropriate for OCI 1.1 to reject content that was accepted according to OCI 1.0 specs. This change could create a scenario where existing content cannot be included in a transition to 1.1. What would the OCI guidance be for that?
Thinking about registries that reject an index if there is a missing child manifest, in my experience, it's possible to delete one of those child manifests after the index is pushed. Taking a similar scenario to referrers, what happens if a manifest is pushed, referrers are pushed, the manifest is deleted, and later pushed again? Are the referrers immediately deleted, do they get reattached, or can the referrers be pulled by their individual digest but not appear in the referrers API?
To put some numbers to my concerns, I go back to a repository mirroring tool. If it knows the source always includes all metadata before pushing an image (ideally that would become a best practice), then it can do a simple HEAD request on each tag to compare the digest of the index.
Without that guarantee, that check becomes recursive, checking every every manifest for a referrer list. To take what I'd hope is a not too distant example, consider an image with support for 10 platforms (golang has that today), and each platform has 5 referrers (2 SBOMs, a SLSA attestation, and 2 signatures). That's 1 (index) + 10 (images) + 50 (artifacts) = 61 manifests to check for referrers (an artifact could have it's own referrer, e.g. a signed SBOM). I'd also need to do a full GET on each of the manifests from the source registry.
So that's 61 manifest GET plus 61 referrer lists vs 1 manifest HEAD. Multiply that by the number of tags being copied. And the issue gets worse with more tags, more platforms, and more artifacts per platform.
This discussion seems to stem from the fact that the artifact reference points the "wrong" way. Digests must exist before the manifest because the manifest points at the digests. Following the same logic, artifacts point at the manifest, so the manifest should exist before the artifact.
There are a number of reasons behind the decision to have artifacts function the way they do. But by keeping the MUST here, it comes across like we want artifacts to act like dependencies of the manifest despite functionally existing the other way around. The increase in requests is a downside, but also a consequence of the initial design.
The direction depends on the perspective. From the server walking the descriptors, the manifest is a child of the artifact. From the client following the referrers API, the artifact is a child of the manifest.
The direction depends on the perspective.
The images and registry is designed around a merkle tree. The reference artifact ends being the top hash in the merkle tree (even if it may not be the first hash clients request). Requiring a registry to accept only the top hash breaks the ability of a registry to verify the merkle tree before accepting content. In terms of "wrong" or "right" way, the reference design has good reasons for this direction. The "MUST" language breaks existing registry design and best practice though.
Treating the subject as a descriptor that must be followed in the merkle tree is a design decision of registries that I think is worth deeper consideration. Clients typically will not pull the artifact and then use that that to find the image, typically it will be the reverse.
Perhaps it would have been better to have defined the association using a http header rather than a field in the manifest, to make the distinction between what's in and out of the merkle tree more distinct. But I believe we are past the point in the release cycle where a design change like that would be worth considering.
At least in the signature case, then one would have to push manifest (by digest) first, push the signature and then push the tag. However, would signature be able to "cover" the tag a priori?
While I would prefer, from a client perspective, to keep this MUST to assist in clients having the order option, there was never an intention to break 1.0 image registry storage systems with this MUST requirement. Loosening now seems the only appropriate course of action.
Maybe this becomes a SHOULD.
However, the following is a legitimate use case (cc: @sajayantony)
Ensuring metadata (like signatures) are available when the manifest is pushed for end users (eliminating failures in admission controllers, and the associated logs/alerts).
How does one achieve this? https://github.com/opencontainers/distribution-spec/issues/459#issuecomment-1701509750?
If nothing in the manifest indicates there even is a signature (since signature points to manifest, not the other way around), isn't this already a concern with the current design? There would be no validation on the registry side, it's only enforced on the client side. So the client should be able to handle it either way.
Yes, it may not be too bad in reality. Will let @sajayantony @sudo-bmitch weigh in on this one.
Thanks for running the conformance tests, btw.
How does one achieve this? https://github.com/opencontainers/distribution-spec/issues/459#issuecomment-1701509750?
This was the bit about "transactionality" discussed in a prior OCI call; the thought was that the workflow you describe could be used (as mentioned in the issue body):
The same solution could apply here, use push by digest or a temporary tag when pushing manifests that should not be considered fully available and "tag" it once complete (via upload of the manifest using tag reference).
Past that, a proper "transactional upload" API seems like obvious future work, to better support existing (e.g. referrers and manifest list) and future use-cases involving any sort of reference.
If nothing in the manifest indicates there even is a signature (since signature points to manifest, not the other way around), isn't this already a concern with the current design? There would be no validation on the registry side, it's only enforced on the client side. So the client should be able to handle it either way.
It is possible for a client to upload the manifest before the signature, which is something admission controllers would block and alert on. But the concern being raised is that it would now be possible for the registry to block that, so even a well behaved client is not able to push content its preferred order.
The same solution could apply here, use push by digest or a temporary tag when pushing manifests that should not be considered fully available and "tag" it once complete (via upload of the manifest using tag reference).
This may help a single use case with signed images and admission controllers. But for the more general use case, this still has issues when images are copied by digest, including child images in a multi-platform list.
My concern continues to be that a well behaved client, with cooperation of other well behaved clients, would not be able to ensure content is complete without performing a fully recursive query on every manifest. An upload could be interrupted at any point, triggering a future retry. When that happens with multi-platform images, clients know where they left off because of which manifests are already pushed, and can assume that child manifests and blobs have been pushed too. But with referrers, that assumption is no longer valid and every child manifest must also be checked for missing referrers on every single copy.
As a result, running a mirror will become a very expensive operation, or extremely error prone. Either every image is checked recursively on every update of the mirror. Or tooling assumes that an image is complete if the manifest already exists, leaving the mirror in a broken state after any referrer copy fails. And anyone mirroring from Docker Hub with a recursive check will hit rate limits very quickly if they are not very careful to ensure the recursive queries are not done from the other registry. The rate limit is also one of the things that would trigger a broken copy if it occurs after the manifest is copied but before the artifacts are copied.
From this week's meeting discussion: this is a downgrade for users that are losing a capability they have with OCI v1.0 registries.
Referrers were designed the way they were intentionally - we can add as many referrers we want for the same digest, we can modify them, all without modifying the manifest. The manifest has no dependency on these things. Signatures are a valid use case - but are a client concern for what conditions it wants to apply and when. Even with a MUST here, signatures are built on social contracts, without guarantees. What is an admission controller to do with a client that isn't "well behaved", and adds the signature after? Won't it already need to have retry functionality built in? If we specifically want to handle the signature use case, it needs to be specifically accounted for in a manifest. Since they were designed as an optional add-on, they have the side effects come with being an optional add-on.
It does sound like there could be a use case for transactional uploads, but this is more of a general concept that could be useful for many use cases.
a well behaved client, with cooperation of other well behaved clients
I'm probably newer to the OCI spec than most here, but from what I've seen, the number of "well behaved clients" is approximately 0, because everyone has a different definition of what "well behaved" means. This isn't bad, clients interpret things differently depending on different use cases. If we want something to be handled consistently, it needs to be codified in the spec.
Rate limits on Docker Hub is a separate concern, and one that can be addressed based on changing customer needs & usage patterns, and shouldn't be a deciding factor when trying to decide on different paths.
I keep thinking about this problem, and I think we're honestly really close to uploads that look transactional today. I'll try to explain further, but hopefully this all makes sense. :crossed_fingers:
For context, I'll start with the main use case I think is really important for the issue at hand: being able to make sure a signature is available before users try to pull an image (so that policy can be enforced accurately). Without this, we'll have frequent "brown outs" of image pulls as users race the push of the manifest vs the signature, and the frequency of users hitting those edge cases will increase with the number of users (cue my DOI maintainer hat where we've experienced exactly this with previous incarnations of our multi-architecture support and the angry users that generated).
Now my proposed (small, incremental) solution!
Uploads of all objects can occur entirely by digest, including manifests. The only way users can discover a manifest to pull (using only official OCI distribution-spec APIs) is by:
- querying a tag
- querying the tag listing API, then querying a tag from it
- already knowing the digest (in which case they very likely either have the object already, or looked up the digest via one of the previous means)
So if I push an object by digest, the chances of someone trying to pull it before I'm "ready" are very, very low. Thus, it's only tagging that needs to be transactional, right? The only issue I see there is that "tagging" is not a direct action we can perform -- it's a side effect we can achieve by uploading a manifest by name instead of by digest (and manifests might not be small -- possibly as much as 4194304 bytes, which is potentially a heavy upload just to update a pointer).
In other (shorter) words, I'm proposing a new (or updated) API endpoint for lightweight tagging of an existing uploaded manifest without having to upload the entire manifest contents again, and I believe this satisfies the need for a transactional API, hopefully in a way that's easy for existing registries to implement.
In the signatures example, that means this could be our extended "transactional" flow:
- upload blobs (inc. config)
- upload image manifest (by digest)
- upload image signature (subject -> image manifest, by digest)
- update/add tag to point to new digest, signalling that the image is "ready" for use
(but not having to re-upload potentially 4MiB of useless extra data to accomplish this flow)
FWIW, dropping the MUST here will (if I understand correctly), make it impossible for signatures to be pushed to a repository separate from the thing they reference. This was explicitly marked as a requirement in the user stories at the start of this effort: https://github.com/opencontainers/wg-reference-types/blob/main/docs/REQUIREMENTS.md#content-management
As a user, I want to push an artifact that references another artifact that doesn't exist on the registry yet.
Dropping support for a user-facing requirement because of implementation challenges is understandable but disappointing this late in the game.
@tianon
upload image signature (subject -> image manifest, by digest) update/add tag to point to new digest, signalling that the image is "ready" for use (but not having to re-upload potentially 4MiB of useless extra data to accomplish this flow)
^ the implication is that one cannot "sign" the tag?
Correct; how would you sign the tag with the existing data model today?
I've opened https://github.com/opencontainers/distribution-spec/issues/483 to discuss/vote on paths forward; I'd like to keep this issue for discussion/clarification of the problem itself, and #483 for discussion of solutions/consensus on what the path forward is.
My concern continues to be that a well behaved client, with cooperation of other well behaved clients, would not be able to ensure content is complete without performing a fully recursive query on every manifest. An upload could be interrupted at any point, triggering a future retry. When that happens with multi-platform images, clients know where they left off because of which manifests are already pushed, and can assume that child manifests and blobs have been pushed too. But with referrers, that assumption is no longer valid and every child manifest must also be checked for missing referrers on every single copy.
@sudo-bmitch that is always going to be the case with referrers, as they were created in order to be able to add references at any time in the future, in order to support workflows such as signatures for later approvals, versus build time. This does make mirroring essentially very difficult without an additional API to give a feed of referrer changes, as there is no real definition of "complete" anyway.
@sudo-bmitch that is always going to be the case with referrers, as they were created in order to be able to add references at any time in the future, in order to support workflows such as signatures for later approvals, versus build time. This does make mirroring essentially very difficult without an additional API to give a feed of referrer changes, as there is no real definition of "complete" anyway.
It requires cooperation from the producer to implement, and the registry to support. But I don't believe it's going to be such a rare case. The model fits tightly with existing usage models of registries where layers are pushed before the manifest, and clients stop following the DAG when the reach content they've already seen. And the model fits closely with the current model of producing images where a CI pipeline creates and pushes an image once, without frequent updates (a change would involve running a new pipelines and pushing a new image). In that model, the changes happening to referrers are unlikely to come from upstream, and instead come from the data pipelines, ingesting images from upstream registries, scanning and attaching SBOMs and internal approvals, before pushing to an on-prem registry. And the pinned digest on that index is an important signal to those internal consumers that the image is complete and also that it matches the upstream image.
Fundamentally, I believe the disagreement is from the two sides wanting consistency guarantees, but having a different view on what that consistency looks like. Registry operators are defining the DAG with the signature as a root node, where consumers would pull the signature and walk that to the signed image. Consumers are drawing this as two separate DAGs where the image DAG is pulled, and the signature is queried from the referrers API to pull that separate DAG.
I think each side is looking for a way to enable the other to get their result while retaining our own consistent model. So the proposal from registries is for clients to depend on load bearing tags. And the proposal from clients is for registries to implement a transactional API. Each of these would be a compromise resulting in non-trivial issues.
I believe the issue is more because we're trying to enforce social guarantees on a system that was explicitly designed to not have guarantees. If we build systems that expect certain social contracts to be enforced based on the current use cases, we will run into issues when other use cases come up that require different social contracts - what if "ReadMes" or "examples" become artifacts, and they frequently change post manifest push? Maybe need a different class of referrers, that can only be pushed before subjects.
If we build systems that expect certain social contracts to be enforced based on the current use cases
Not enforced, permitted. We're giving clients the ability to build these workflows. If and when they use that would be up to them.
We aren't forcing image producers to push the signature first, we're permitting them to develop that workflow if they want it. If other users want to modify their image later, they have that flexibility. And if the source does the latter, and you want to copy those changes, then you'll need a recursive copy. But if the source does the former, then you have the choice to dramatically simplify the copy of an unchanged image.
The core issue is related to consistency. (1) Has the full list been uploaded to the registry? And (2) Are all the dependencies correct and available? I believe the deviations from the core Merkle dag data model is the cause for these issues. When the referrers list is an uploaded index (such as in compatibility mode), there is no ambiguity over whether the referrers is a complete set, since the availability of the referrers index is determined by the client and it is also the root of a Merkle DAG. We shouldn’t change or break the data model for referrers, it is unnecessary and causes more problems than it solves.
I propose these changes for 1.1.
- Define
PUTfor/v2/<name>/referrers/<digest> - Remove filtering such as
GET/v2/<name>/referrers/<digest>?artifactType=application/vnd.example.sbom.v1- This should be done by the clients and the list of referrers should not be unreasonably large that a client could not easily filter them. - Add registries SHOULD treat
GET /v2/<name>/referrers/<digest>andGET /v2/<name>/manifests/<alg>-<ref>as the same resource. - Add a registry MAY generate the referrers index, but include the caveats around consistency and lack of content addressability
- Add a registry SHOULD support
If-MatchonPUT /v2/<name>/referrers/<digest>- This solves the atomicity issue around lost updates and makes transactions unnecessary since the atomic update is to the root of the Merkle DAG. There is already a MAY for this on the compatibility approach, but since this is a new endpoint we can have stricter language in supporting it.
I believe this solves a number of issues with the referrers API including:
- Adding the explicit
PUTsimplifies clients which already MUST implement the compatibility flow. The compatibility flow is much simpler as a client may try to thePUTto the referrers endpoint, then just fallback to uploading viaPUTon the manifests endpoint. - Reduces the risk to clients of getting an unreasonably large number of referrers since any pulled referrers index would have been generated by a client.
- Removes the artifact filtering on the registry side which was already optional and problematic to implement
- Allows a natural ordering of uploads to allow registries to verify the Merkle DAG for every piece of content
- Makes mirroring much simpler since the set of referrers can just be compared against the hash of the index, the hash is much more meaningful when it corresponds with client uploaded content.
- No need for transaction API changes
- No reliance in tags, can still use an external or upstream registry as the trusted tag source
- Allows signing of sets of referrers via attached signatures or notary v1-like tag store
Looking over the suggestion, for myself this doesn't offer any value over option 2, so my vote on the issue is unchanged. Concerns I have include:
- This is a major breaking changing to all of the existing 1.1-rc* implementations and would require clients and registries to do a rewrite, along with recreating the conformance tests, and returning to the community for feedback. It feels worse than the 999 change to me.
- Registries could still reject the push of content with a subject pointing to a missing manifest, so the differing opinions on the DAG discussion are unresolved by this.
- The
OCI-Subjectheader wouldn't be used and so every push of a manifest with a subject requires the full fallback processing, adding to client overhead and additional API round trips. - Registries with an eventually consistent backendcannot support conditional requests, so there's no possibility of an eventually consistent referrers response to a referrers race if this is implemented on the client side, where that could be done with a server side solution.
- Clients that work with a single
artifactTypewere requesting the filter as an efficiency, and servers have the option of whether they want to support it. We had discussed whether this should be only a client side feature in the working group, and I don't want to restart previously settled debates for something that a registry can opt out of.
Given that, I'm opposed to the proposal. I think it would have been worth considering during the working group, but this late in the release cycle, I feel it's too disruptive to the community that has already written so much code both in the working group and now against the RC releases. A registry can decide to not implement the 1.1 spec, sticking with the 1.0, where the subject field is not defined and not part of the DAG, and where clients push the fallback tag as an index. I think gives an identical result to the proposal (content addressability, no separate API, no filtering, all client managed) while allowing clients to push content in either order.
Since we have had a lengthy discussion on this, had a vote that's been open for several weeks, and the vote is leaning against this request, my suggestion is to close this issue and move on. I say that with a lot of hesitancy because I'd much rather find a solution where everyone meets in the middle. But in this case, it's been very contentious because there is no middle option that we've been able to find.
FWIW, putting some thoughts down ...
- Related artifacts may be built and pushed from different CI pipelines and even different cloud providers (looking for best-in-class). Strong consistency requirements means choreography in the CI pipelines.
- Not just create-time considerations. Products EOL'ed but if certain artifacts such as SBOMs, CVE lists, etc are required to be saved long after images are deleted? Maybe solve this with "tombstone" images, for example?
Since we have had a lengthy discussion on this, had a vote that's been open for several weeks, and the vote is leaning against this request, my suggestion is to close this issue and move on. I say that with a lot of hesitancy because I'd much rather find a solution where everyone meets in the middle. But in this case, it's been very contentious because there is no middle option that we've been able to find.
+1 here, I want to point out again that this wasn't a simple oversight or miss, it was a conscious decision to enable a common scenario that is in use today by real workloads - storing signatures/attestations/sboms/etc in a repository separate from the image.
This was captured as a core requirement in the earliest stages of this WG and the MUST was placed there for a reason, to support that scenario.
storing signatures/attestations/sboms/etc in a repository separate from the image.
I must point out once again that changing a MUST to a MAY doesn't forbid anything. The distribution-spec doesn't disallow "sparse images" (index without all referenced content) today; it merely doesn't mandate them.
I must point out once again that changing a
MUSTto aMAYdoesn't forbid anything. The distribution-spec doesn't disallow "sparse images" (index without all referenced content) today; it merely doesn't mandate them.
I think we'll have to agree to disagree here, the purpose of this specification is to encourage interoperability, and making this detail optional would have the opposite effect IMO. As @sudo-bmitch says, support for this version of the specification is also optional.
My attempt here is to find some common ground that allows registries and clients to preserve their data models while supporting the widest range of use cases. I also don’t think we have a quorum amongst maintainers for forcing this data model change. I agree it would have been better to take place inside the working group and I have said before that I felt the working group disbanded too early. When making changes to existing specs, there needs to be consensus amongst the maintainers of those specs and we can expect drafts to change. If simply changing the MUST to a MAY breaks this feature and makes it less interoperable, then we need to address some of the core design before we finalize and push it out. My point is that sticking with the proven data model actually solve these issues and it’s not too late to have this discussion. I think we can all agree we want to work together constructively and avoid a contentious or failed release vote.
This is a major breaking changing to all of the existing 1.1-rc* implementations and would require clients and registries to do a rewrite, along with recreating the conformance tests, and returning to the community for feedback. It feels worse than the 999 change to me.
My proposal is targeted at simplifying the 1.1 changes and have them align closer with what exists in 1.0, which makes them much easier to support. Changes between rcs in OCI has been very common in OCI and why we tend to have a longer release timeline. I was hoping the working groups would solve some of this, but in the end the changes were handed off from working group to specs and we are iterating with focus on the specific specs now. This was the first working group and I think we learned quite a bit.
Registries could still reject the push of content with a subject pointing to a missing manifest, so the differing opinions on the DAG discussion are unresolved by this.
Yes, that is the data model in 1.0 and it is well-proven. If the attempt is to change the data model, we should have a more focused discussion on that.
The OCI-Subject header wouldn't be used and so every push of a manifest with a subject requires the full fallback processing, adding to client overhead and additional API round trips.
We can improve this proposal if needed. The client is in a much better position as the owner of the content to make decisions about which referrers are included. Additional API round trips would always be necessary to support any of the “transactional” API ideas. This would only add a single request at the end.
Registries with an eventually consistent backend cannot support conditional requests, so there's no possibility of an eventually consistent referrers response to a referrers race if this is implemented on the client side, where that could be done with a server side solution.
Support for conditional requests is optional and clients solve this just fine today with existing indexes. There is no additional concern here than over the previous approach, as the eventually consistent nature could also cause referrers to be missing from any auto-generated response. I would argue that is worse, as you could get a mix of different referrers in that case without realizing it is not a complete set.
Clients that work with a single artifactType were requesting the filter as an efficiency, and servers have the option of whether they want to support it. We had discussed whether this should be only a client side feature in the working group, and I don't want to restart previously settled debates for something that a registry can opt out of.
It is only necessary in the auto-generated case because the lists of referrers can easily grow unbounded. This was my biggest sticking point when I was looking into adding support for this on the containerd side. The compatibility approach I found to be much more reliable and aligns better with existing logic. I don’t think it will be necessary anymore and would only add confusion, but I don’t think its a big deal to include as optional along with the auto-generated option..
I think we'll have to agree to disagree here, the purpose of this specification is to encourage interoperability, and making this detail optional would have the opposite effect IMO.
We agree to disagree in the spec by not baking in opinions about implementations or use cases. This is especially true for newer endpoints which haven’t had wide adoption, the specification should be as un-opinionated as possible. It is my opinion that we should not move away from the Merkle tree model, it is well-proven, secure, and scalable. If some registries want to auto generate indexes and skip content checks, that is their right and the spec should allow it.