image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

`subject` field ought to reference any digest

Open vbatts opened this issue 2 years ago • 12 comments

(creating an issue from https://github.com/opencontainers/image-spec/pull/999#issuecomment-1419626599) https://github.com/opencontainers/image-spec/blob/a7ac485/manifest.md?plain=1#L70

currently the subject field in ./artifact.md and ./manifest.md says it can only point to "another manifest".

This seems unnecessarily limiting.

I'm sure there was conversation around this. I know I was in a call where I voiced strongly in favor of allowing pointing to any object.

As an ISV or content producer, I may want to put an artifact containg signature/attestation/whatever for say a specific layer in an image. This way anyone can build FROM that original image set of layers, and not lose that reference to one of the layers, because the original image manifest will no longer be relevant. hypothetical example:

  • Red Hat publishes there base RHEL (or UBI) image
  • Some ISV publishes their database on this certified base layer
  • Some customer uses this database

if the referencing subject can only point at a manifest, then after the first FROM, the end user deployments can not easily discover them without traversing or something complicated. Where as, allowing a publisher to say point their signature at the layer digests themselves, now would allow users to naturally discover the stack of referenced objects for all the layers/objects.

vbatts avatar Feb 06 '23 19:02 vbatts

How would a client know what API to use to request and parse the response from an arbitrary digest? Do clients need to maintain a list of manifest media types (what happens to old clients when a new type is added)?

The big issue is if I can request referrers to any blob, that also means it's possible to create logical loops with an image/artifact manifest that has both a subject and layer pointing to the same blob. An image has a layer, the layer has referrers to the image, the image has a layer, repeat.

This also scales up the number of API calls I need to make in the common use cases. When looking for referrers to an image, I would need to check both the manifest and every layer and the config blob. When recursively deep-copying an image, I would need to check every blob in addition to every manifest for referrers.

sudo-bmitch avatar Feb 06 '23 19:02 sudo-bmitch

On 06/02/23 11:59 -0800, Brandon Mitchell wrote:

How would a client know what API to use to request and parse the response from an arbitrary digest? Do clients need to maintain a list of manifest media types (what happens to old clients when a new type is added)?

The big issue is if I can request referrers to any blob, that also means it's possible to create logical loops with an image/artifact manifest that has both a subject and layer pointing to the same blob. An image has a layer, the layer has referrers to the image, the image has a layer, repeat.

You couldn't create this loop without a random guess of the hash. Unless you're meaning something else entirely. This does not sound like a real issue.

This also scales up the number of API calls I need to make in the common use cases. When looking for referrers to an image, I would need to check both the manifest and every layer and the config blob. When recursively deep-copying an image, I would need to check every blob in addition to every manifest for referrers.

I would venture the opposite. If clients would have to traverse to discover what prior manifests (that included the layers in question) had something referring to them, this would be much more expensive, than fetching a manifest, then a quick check of what "referrers to" the blobs in the list. That sounds straight forward.

vbatts avatar Feb 06 '23 20:02 vbatts

Very cool. This issue has been part of the focus of my work for the past year and I'm excited to see this issue being raised.

If we are saying any object, then the content could exist in any registry or any namespace within a registry, which brings up needing a standard for cross-namespace references.

I'm hoping to see this addressed from the work here: https://github.com/oras-project/artifacts-spec/issues/72

afflom avatar Feb 06 '23 20:02 afflom

if the referencing subject can only point at a manifest, then after the first FROM, the end user deployments can not easily discover them without traversing or something complicated.

Base image annotations can help here. An image FROM some.signed.image can retain enough information about some.signed.image to discover that base image's signatures, etc., without having to be able to sign its individual layers. Indeed, these annotations are already in use today to enforce that images are built from signed base images, for example.

Also potentially complicating this issue, the subject doesn't have to refer to a manifest that exists -- we were very clear and all agreed that you should be able to push a signature for an image manifest that hasn't been pushed yet, or retain the signatures for images that have been deleted/GCed. So subject referring to "a manifest" doesn't mean "the registry must make sure that manifest exists", just that it's intended to refer to a manifest.

Signing individual blobs didn't come up at all as a use case in the WG, AFAIK. If that's something we're interested in supporting, I think we should discuss it more, but I don't think it should be considered a blocker for v1.1. AIUI if we wanted to let subjects point to manifests or blobs or other, that would be a fairly limited change to the spec, but one that we should discuss a lot more before adopting, mainly due to @sudo-bmitch 's cycle concerns.

imjasonh avatar Feb 06 '23 21:02 imjasonh

You couldn't create this loop without a random guess of the hash. Unless you're meaning something else entirely. This does not sound like a real issue.

@vbatts Here's a logical loop, no digest guessing required:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 3101,
    "digest": "sha256:c621799bcec256bf9be20c1998aa087fcdb0bb7fff5c10a3df968eeda987906c"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 85,
      "digest": "sha256:3d67ddc212ffba510628b93c0936f90dabcab9993f095cc1899fb1bcbe86b42a"
    }
  ],
  "subject": {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 85,
      "digest": "sha256:3d67ddc212ffba510628b93c0936f90dabcab9993f095cc1899fb1bcbe86b42a"
    }
}

Walk the manifest, to the layers, to all referrers to the layer (effectively walking the subject link in reverse), back to the manifest, and repeat.

sudo-bmitch avatar Feb 06 '23 21:02 sudo-bmitch

Wouldn't this concern only be an implementation concern? We could easily say "the subject MAY NOT refer to a digest that exists in the layers/blobs" or similar.

vbatts avatar Feb 06 '23 22:02 vbatts

Wouldn't this concern only be an implementation concern? We could easily say "the subject MAY NOT refer to a digest that exists in the layers/blobs" or similar.

This can also involve multiple or cross references, because you don't need to guess the manifest digest to predict blob digests.

blobA: sha256:aaaa blobB: sha256:bbbb

Image 1: layer with blobA, subject with blobB Image 2: layer with blobB, subject with blobA

sudo-bmitch avatar Feb 06 '23 23:02 sudo-bmitch

I will continue to remain skeptical of the GC cycle argument until we stop drawing the arrows backwards.

jonjohnsonjr avatar Feb 07 '23 00:02 jonjohnsonjr

GC is one thing that probably doesn't like cycles, but this also applies to anything that recursively walks the graph. The purpose of referrers is to find artifacts that refer to a manifest and treat them as a child of the manifest. That can be anything performing a deep copy, a UI showing the multi platform image with associated artifacts in a filesystem like tree, and probably a bunch of use cases I haven't considered.

sudo-bmitch avatar Feb 07 '23 00:02 sudo-bmitch

I still haven't heard a strong use case for attaching manifests to blobs, and one didn't come up in the entire WG discussion about references.

I'd like to reiterate my position that we can punt on attaching to blobs until demand arises, and keep the scope of v1.1 at its current size. We retain the flexibility to allow references to blobs in a future release, with the benefit of experience about how it's used in practice for manifests in v1.1.

imjasonh avatar Feb 07 '23 15:02 imjasonh

On 07/02/23 07:42 -0800, Jason Hall wrote:

I still haven't heard a strong use case for attaching manifests to blobs, and one didn't come up in the entire WG discussion about references.

Ah ok. I feel like it was one of the few things that I gave input on. :-\

I'd like to reiterate my position that we can punt on attaching to blobs until demand arises, and keep the scope of v1.1 at its current size. We retain the flexibility to allow references to blobs in a future release, with the benefit of experience about how it's used in practice for manifests in v1.1.

Sure. We can roll like this, but changing it later will still be a point of folks perhaps only support the 1.1 style, and not later versions.

vbatts avatar Feb 07 '23 18:02 vbatts

Can we close this as unplanned? We've managed to add logical loops into other parts of the spec, so that objection is no longer valid. But my question of how a client would know which registry API to query remains. In other parts of the spec, a descriptor reference is either a blob or a manifest, but not both.

Other concerns I have include the lack of a use case showing a real need for the change, and the API overhead to perform a deep copy of a manifest and all the referrers. As an example, for an image with 7 platforms, and 3 referrers per platform, I'm already up to 29 referrers API calls to copy the image (1 index, 7 images, 21 artifacts). If each image had 10 layers, that would add another 70 API calls to the registry to copy the image even if not a single layer had any referrers.

One final consideration is whether the changes to the image manifest spec, allowing it to be used for a single layer, cover the use cases being considered here.

sudo-bmitch avatar Jun 29 '23 14:06 sudo-bmitch

Chatting about this in today's meeting we decided to close this until there's an identified need that doesn't have a workaround with the current implementation.

sudo-bmitch avatar Aug 01 '24 17:08 sudo-bmitch