image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

Proposal: Add References

Open dlorenc opened this issue 3 years ago • 66 comments

OCI References

New March 27th 2021: The Testing section below now shows some validation to attempt to prove that this does not break existing clients. New March 28th 2021: The Backwards Compatibility section contains how I'm defining backwards-compatibility for the purposes of this proposal.

Background

This document contains two high level proposals that allow for artifacts in a registry to be linked together. They are meant to be mostly equivalent to the linking portion of the proposed OCI artifact manifest changes.

This document takes a different approach in a few areas:

  • Backwards compatibility with existing types. We allow for the existing types (Image, Index, Descriptor) to reference each other and to be referenced, rather than only supporting the new ArtifactManifest type. Backwards compatibility is defined in more detail below.
  • No other changes are included here (renames, reorganization). These proposals represent the bare minimum format and API changes required to enable linking of objects, including in a registry. While some of the renaming/reorganization may be desirable, it is also less critical and may be a point of contention that could further delay this work.

Additionally, this document contains further design and discussion for the "query" portion of the API. This should be compatible with the OCI artifact manifest changes, and can serve as inspiration or as an addition to that proposal if we decide to move forward with that change (either instead of or in addition to this one).

We start out with the proposed API and format changes, then discuss the requirements they were designed against and the CUJs they address.

Proposed Design

This proposal accompanies a pull request which contains the actual proposed changes to the specifications and types in this repository.

We propose adding a new field and a new GET API (in the distribution-spec). Each change is described below.

Image Spec

Refer to the linked PR for the full changes.

We propose adding a new field (reference) to the Image Manifest, Index and Descriptor types. This field will contain a Descriptor that points to the linked object.

Here is an example:

{
  "mediaType": "application/vnd.example.signature+json",
  "size": 3514,
  "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
  "reference": {
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "size": 1201,
    "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
  }
}

Distribution Spec

We propose adding a new, read-only API to the Distribution Spec to query the registry for references to a given object.

The API will look like:

GET /v2/<name>/manifests/<ref>/references

The response from this API will be a list of Descriptors that matches the existing Manifest Index specification:

{
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "size": 1201,
      "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
    }
  ]
}

There will be an accompanying PR to the distribution-spec with more details on this API if we decide to move forward. It is currently located here to keep the entire proposal in one place.

Requirements

The proposal above was designed with the following requirements in mind. There was also an overarching constraint to make these API changes as small as possible, but no smaller:

  • Able to attach multiple notes/references to an existing object
  • Able to query the registry for all notes/references attached to a specific object
  • Attaching notes/references DOES NOT mutate the initial object
  • Notes/references should be able to be copied with an image between repositories
  • Simple to understand
  • Simple to implement

Backwards Compatibility

Updated March 28th

There is no formal definition for backwards-compatible changes in this repo. Here's how I'm thinking about it:

  • A reference-aware client can push objects with the reference field to reference-aware servers
  • A reference-aware client can retrieve objects that have the references-field set from reference-aware servers.
  • A reference-aware client should be able to push objects with the reference field to non-reference-aware servers as per https://github.com/opencontainers/image-spec/blob/master/considerations.md#extensibility, but if servers fail/error here instead this is still fine.
  • A non-reference aware client can pull objects that have the references-field set from reference-aware servers, without erroring. They should not break, and should simply ignore the field.

The final bullet here is the most important one. If clients break because of a new field being present, this is not backwards-compatible.

I've verified the following clients so far:

  • [X] Oras v0.11.1
  • [X] Docker 20.10.5
  • [X] Crane v0.4.1

Please suggest others!

Testing

New March 27th 2021

I've implemented this in a patch to distribution as a testbed to try to validate whether or not clients get broken with this. My patch is here: https://github.com/dlorenc/distribution/tree/references, and a registry image is available here: gcr.io/dlorenc-vmtest2/registry:references

You can pull and run that with:

docker run -p 5000:5000 gcr.io/dlorenc-vmtest2/registry:references

I made a client to test this with. It's available in a patch to ggcr here: https://github.com/dlorenc/go-containerregistry/tree/references

There's a simple command line tool to create an object that refers to another object. With that code checked out, and the registry running, you can do:

# Copy an image into your local registry for testing
$ go run ./cmd/crane/ cp ubuntu localhost:5000/ubuntu

# Add a reference to it!
$ go run ./cmd/ref localhost:5000/ubuntu localhost:5000/ref-to-ubuntu
Creating a reference to image: localhost:5000/ubuntu at: localhost:5000/ref-to-ubuntu

# Look at the glorious, referring manifest (the object here is just ubuntu as well, so we can run it)
$ go run ./cmd/crane manifest localhost:5000/ref-to-ubuntu | jq .
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 2724,
    "digest": "sha256:01b9e61b93fb6b71b7278bc7a7e8a21417536169cc7c8df18f07c0774539fd69"
  },
  "layers": [
    <hidden for brevity>
  ],
  "reference": {
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    "size": 943,
    "digest": "sha256:c65d2b75a62135c95e2c595822af9b6f6cf0f32c11bcd4a38368d7b7c36b66f5",
    "platform": {
      "architecture": "amd64",
      "os": "linux"
    }
  }
}

# Now run it!

$ docker run -it localhost:5000/ref-to-ubuntu
Unable to find image 'localhost:5000/ref-to-ubuntu:latest' locally
latest: Pulling from ref-to-ubuntu
Digest: sha256:011b3069012baf7f4a8a7f7eea5b0a545d6cd98127e1a3651513ba88e983b246
Status: Downloaded newer image for localhost:5000/ref-to-ubuntu:latest
root@8876f9f8c13f:/# It works!

Example Use Cases

Signing and verifying an image

Signing:

  1. User creates some payload representing the image.
  2. User signs this payload.
  3. User creates an Image containing this signature, with a meaningful mediaType for this signature. This signature must contain a protected reference to the subject.
  4. User uploads this via the normal upload flow.

Verification:

  1. User locates the object they wish to verify signatures for.
  2. User queries the new GET API for references
  3. User iterates through this looking for signature objects with the expected mediaType
  4. User verifies the attached signature (and included protected reference)

Attaching SBOMs to an artifact

to be filled in

More

Add more here!

Registry Implications

Registries may need to maintain a reverse index to efficiently satisfy queries for references to a given object. Registries will need to parse and understand reference fields in order to support this.

Registries are free to implement garbage collection of referenced objects as they see fit.

Alternatives Considered

Registry only changes

We also considered proposing a design where references are NOT included on the existing types. It would have looked like:

GET /v2/<name>/manifests/<ref>/notes
POST /v2/<name>/manifests/<ref>/notes
PUT /v2/<name>/manifests/<ref>/notes
DELETE /v2/<name>/manifests/<ref>/notes

Pros

This would not require changes to the image spec

Cons

  • Requires more changes to the registry API and registries
  • Only works in a registry context, does not work on a filesystem or OCI layout
  • Unclear how this would work when copying objects between registries, which is a critical requirement

dlorenc avatar Mar 17 '21 19:03 dlorenc

Why include the size of the reference?

sargun avatar Mar 17 '21 20:03 sargun

Why include the size of the reference?

size is a REQUIRED descriptor property, so we need to include it, but it's the size of the thing being referenced.

Looking at this example:

{
  "mediaType": "application/vnd.example.signature+json",
  "size": 3514,
  "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
  "reference": {
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "size": 1201,
    "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
  }
}

The outer descriptor is pointing to a signature of size 3514 and the "subject" of that signature is the manifest list that reference points to, which is 1201 bytes.

(For posterity), the thing being referenced happens to be ubuntu:latest currently:

$ crane manifest ubuntu | sha256sum 
b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b  -

$ crane manifest ubuntu | wc -c
1201

jonjohnsonjr avatar Mar 17 '21 22:03 jonjohnsonjr

Some things I'm personally unsure about for this proposal:

plurality

The current proposal adds a singular reference to descriptors and image/index objects. I think this is cleaner in most cases, but it's not as concise as a list of references in certain situations.

With a singular reference, it's somewhat cumbersome to express "this manifest references M targets".

naming

I like reference (or references) but it's pretty abstract. If we had something slightly more concrete like subject, I think it's easier to understand as "the subject descriptor points to the subject of this content", whereas reference could mean anything. On the other hand, subject might be limiting because it's too specific, whereas reference could mean anything, which makes it potentially more useful.

Certainly open to ideas here.

jonjohnsonjr avatar Mar 18 '21 15:03 jonjohnsonjr

The current proposal adds a singular reference to descriptors and image/index objects. I think this is cleaner in most cases, but it's not as concise as a list of references in certain situations.

With a singular reference, it's somewhat cumbersome to express "this manifest references M targets".

I agree - but resisted making it plural because I couldn't think of a single concrete use-case for it. If anyone has any we can definitely make this plural.

dlorenc avatar Mar 18 '21 16:03 dlorenc

Why include the size of the reference?

In addition to @jonjohnsonjr 's answer, including the size can avoid DoS attacks on services that will chase these references.

A service can look at the descriptor upfront and say "that size is too big, I'm going to stop here." Or if it decides to read the referenced blob and it turns out to be bigger than it said it would be, the service can say "this blob is bigger than it said it would be, I'm going to stop here" and avoid large responses exhausting resources.

imjasonh avatar Mar 18 '21 16:03 imjasonh

can avoid DoS attacks

Exactly. Here's a scenario I would imagine is prevented by this:

You've been compromised such that there is a MITM between your client and the registry. Fortunately for you, you are smart and deploy everything by digest, so the attacker can't serve you arbitrary images (say, a bitcoin miner). One obvious thing the attacker can still do is just not let you pull the images, but this would likely be easily detected because your deployments would fail. Similarly, they could act as a proxy registry and just serve you images really slowly. This would be annoying, but eventually pulls would succeed or timeout. If they wanted to be even more disruptive, they could try to take down not just one service but everything on the node by flooding your disk with garbage data. If the size of the content you're fetching is unknown, the attacker could set an arbitrarily large Content-Length in the response and feed your client random data until you run out of disk space. By including the size in the descriptor, we guarantee that clients know exactly how much data they should expect to fetch.

jonjohnsonjr avatar Mar 18 '21 17:03 jonjohnsonjr

I agree - but resisted making it plural because I couldn't think of a single concrete use-case for it. If anyone has any we can definitely make this plural.

A few that I can think of based around the image signing work.

  1. A key nears it's expiration, a new key is generated, and the new key needs to resign all previously signed images. Rather than pushing a signature per manifest, we could allow a set of manifests to be signed at once for the repository. This would reduce load on the registries, particularly if clients pull more than one image from the repository and can reuse a cached signature data from a different manifest pull.
  2. Signing multi-platform images should sign each platform's digest (enabling pull by digest to a single platform), and hopefully the tag that points to the manifest list (OCI index).
  3. We're considering ways to integrate the TUF logic with image signing, and that includes a targets metadata that is a list of signed digests and tag pointers in the repository. This would be similar to 1 above.

Note with a plurality of references, GC should only prune the signature artifact when all of those references no longer exists, and not if just one of them is deleted, since the signing data for other images is still valid and useful.

There's also been discussions on Helm artifacts, and one helm chart may include references to multiple images. I'm still not sold on this particular example because you're including a fixed reference to a potentially mutable template, but the logical grouping of one artifact pointing to multiple manifests by a single artifact applies to a larger context of use cases.

sudo-bmitch avatar Mar 22 '21 10:03 sudo-bmitch

Image Spec

Refer to the linked PR for the full changes.

We propose adding a new field (reference) to the Image Manifest, Index and Descriptor types. This field will contain a Descriptor that points to the linked object.

It feels a little forced to add this to the Descriptor object, requiring an extra blob to be uploaded to create a reference. One of the advantages of the Artifact proposals is the ability to upload an artifact that is just references and perhaps some annotations, allowing the entire artifact to be pulled with a single request, rather than moving the annotation data into a separate config blob that gets pushed separately. Logically, it feels like the wrong level. You want the reference on the artifact, not on a blob or config object shipped with the artifact.

Distribution Spec

We propose adding a new, read-only API to the Distribution Spec to query the registry for references to a given object.

The API will look like:

GET /v2/<name>/manifests/<ref>/references

The response from this API will be a list of Descriptors that matches the existing Manifest Index specification:

{
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "size": 1201,
      "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
    }
  ]
}

One of the controversial changes I had pushed Steve to add to the Artifact spec was the ability to inline the artifacts as part of the query response. The reason for inlining results, rather than just a list of manifests, is because the number of signatures can grow on any one image. It may be periodically resigned, and it could be signed by many entities. Those entities may be different organizations (ACME Rockets resigns the image they pulled from Wabbit Networks to use the Notary example). However they could also be done as a proof that an approval was received in a larger organization, showing that the image has passed one of several security checks from things like an image scanning tools. Most clients won't care about all of these signatures, they are looking for a single one from a known entity, that can hopefully be identified by the annotations on the signing artifact. And by inlining the results, we can turn a worst case of n x 2 pulls for signatures (where n is the number of signatures) down to a fixed 2 pulls (one for the inlined list of artifacts, one for the signature blob).

If we decide not to allow inline results, then allowing a query that filters on media type and annotations could allow this to be done on the server rather than the client sorting through the results for the desired annotations, giving us a fixed 3 pulls (one for the query, one for the signature artifact, and a final one for the signature blob).

sudo-bmitch avatar Mar 22 '21 11:03 sudo-bmitch

A few that I can think of based around the image signing work.

  1. A key nears it's expiration, a new key is generated, and the new key needs to resign all previously signed images. Rather than pushing a signature per manifest, we could allow a set of manifests to be signed at once for the repository. This would reduce load on the registries, particularly if clients pull more than one image from the repository and can reuse a cached signature data from a different manifest pull.

So this would sort of turn the conceptual model from "here is a list of signatures all tied to one object" over to "here is a list of signatures tied to many objects". That could make sense if you had a set of signatures that referenced a large set of images, but the signatures all had a similar lifecycle tied to each other or the key, rather than to the images they reference, like you suggested.

I'm still not convinced though - allowing this option forces people to decide up front how their signatures will be managed later. A list gives flexibility, but flexibility could create confusion. Some can be updated by themselves, some are updated by doing a "read modify update" loop on a larger "multi-signature-set" object.

  • Signing multi-platform images should sign each platform's digest (enabling pull by digest to a single platform), and hopefully the tag that points to the manifest list (OCI index).

I think that would be handled by downloading the Index, signing and uploading a signature for each Manifest, then signing and uploading one for the Index itself (so the overall index is signed, and each platform's manifest is signed). I'm not sure I follow the issue. Signed tags are a separate problem.

It feels a little forced to add this to the Descriptor object, requiring an extra blob to be uploaded to create a reference. One of the advantages of the Artifact proposals is the ability to upload an artifact that is just references and perhaps some annotations,

I think this is solved by the #826 proposal - the signatures can be inlined directly in the Data field of the descriptor.

If we decide not to allow inline results, then allowing a query that filters on media type and annotations could allow this to be done on the server rather than the client sorting through the results for the desired annotations, giving us a fixed 3 pulls (one for the query, one for the signature artifact, and a final one for the signature blob).

I'd love both! Inlining signatures on the data field and better querying. We still don't have basic list support cross-registry, so I'd rather get something we can use first then push for better querying. Performance optimizations and the fields we'd like to query on feel premature to me still. We don't know how people will use these new fields in the real world, we can only guess.

One of the controversial changes I had pushed Steve to add to the Artifact spec

A final note: I do like the idea of the Artifact spec. It addresses a lot of issues in the registry today. The downside, is that it addresses a lot of issues in the registry spec. It's going to take a LONG time to get accepted and supported by registries. I don't think we should have to wait for that to add improvements to the existing types.

dlorenc avatar Mar 22 '21 11:03 dlorenc

A final note: I do like the idea of the Artifact spec. It addresses a lot of issues in the registry today. The downside, is that it addresses a lot of issues in the registry spec. It's going to take a LONG time to get accepted and supported by registries. I don't think we should have to wait for that to add improvements to the existing types.

An important part of both this and the artifact spec is the ability to query for artifacts linked to another artifact or manifest. That's a new API that needs to be added to make this work. As soon as we take that leap, this is no longer something we can shoehorn into existing registries, so we may as well come up with the solution that makes the most logical sense.

sudo-bmitch avatar Mar 22 '21 12:03 sudo-bmitch

this is no longer something we can shoehorn into existing registries, so we may as well come up with the solution that makes the most logical sense.

This proposal also includes an API for linking that works for all types. I don't think all new features will be equivalent in terms of cost to roll out. This proposal was specifically designed to minimize changes required to registries.

Nothing precludes more support for filtering/querying later on.

dlorenc avatar Mar 22 '21 13:03 dlorenc

Wait - I think I misunderstood what you're saying about the "list attached objects" API. To make sure I understand:

My proposal currently returns a list of "pointers" to attached objects, where the pointers are descriptors. You're asking for an option to instead "deference" those pointers, where the returned list would be the actual full objects themselves.

Is that roughly correct?

dlorenc avatar Mar 22 '21 14:03 dlorenc

My proposal currently returns a list of "pointers" to attached objects, where the pointers are descriptors. You're asking for an option to instead "deference" those pointers, where the returned list would be the actual full objects themselves.

I'm looking to make the queries faster, so inlining the content a descriptor points to with a dereference. If #826 allows the server to build a descriptor with the content dereferenced into the data field, then that solves my issue. But it's not clear whether this is a server (on a query) or client (on a push) generated field.

sudo-bmitch avatar Mar 22 '21 14:03 sudo-bmitch

Right - I think the two are actually complementary. The only challenge I can think of here with inlining data in the list response will be clients now knowing what the data types are ahead of time. It works in the artifact manifests API because referencing objects can only be one type: the artifact type.

dlorenc avatar Mar 22 '21 14:03 dlorenc

The only challenge I can think of here with inlining data in the list response will be clients now knowing what the data types are ahead of time.

If you have the descriptor, you'll have the media type and can parse the base64 encoded bytes based on that media type. Whether the content/data is unlined, I always want the descriptor with those details to both verify the decoded bytes, and to allow individual blob pulls later (e.g. head request to verify blob still exists for mirror updates).

sudo-bmitch avatar Mar 22 '21 15:03 sudo-bmitch

The current proposal adds a singular reference to descriptors and image/index objects. I think this is cleaner in most cases, but it's not as concise as a list of references in certain situations. With a singular reference, it's somewhat cumbersome to express "this manifest references M targets".

I agree - but resisted making it plural because I couldn't think of a single concrete use-case for it. If anyone has any we can definitely make this plural.

One may need to reference multiple descriptors from one descriptor. Two use cases I can think of is multiple SBoMs and multiple signatures establishing a chain of trust.

nishakm avatar Mar 23 '21 14:03 nishakm

One may need to reference multiple descriptors from one descriptor. Two use cases I can think of is multiple SBoMs and multiple signatures establishing a chain of trust.

I think that's fine - this linkage/field is actually the reverse direction. So each SBOM would reference one artifact via this link. Then the artifact has multiple SBOMs pointing back at it. Same for signatures. The question would be if you need one signature to reference multiple artifacts.

dlorenc avatar Mar 23 '21 16:03 dlorenc

I'm not following the logic to artificially constrain future expansion of this based on the use cases we see today, particularly since some of those use cases would have a value to having a list of external references rather than a single one (Helm charts, TUF snapshots, and merging the signature for multiple objects into a single signature blob). However we could likely hack this solution even more to allow multiple references, even if it's not a list, by putting a different reference on each blob in a manifest. It could even be the same blob digest repeated with different references. (Yes, that's horrible. No, I don't want to do that. But people will do this without better options.)

That discussion aside, this is still appears to be at the wrong level. We're making a reference from a blob within manifest A to manifest B, rather than manifest A to manifest B. And since a manifest has multiple blobs, including multiple child manifests in an index, or a config object and layers in an image manifest, there are lots of potential ways to link blobs within manifest A to manifest B. That link needs to be conveyed to the manifest A itself, or are we suggesting that another index is upload that includes a descriptor to A, and we include the reference on that index descriptor, rather than on one of the blobs within manifest A itself?

sudo-bmitch avatar Mar 23 '21 17:03 sudo-bmitch

I'm not following the logic to artificially constrain future expansion of this based on the use cases we see today,

I'd flip this around. Concrete use-cases don't add artificial constraints, they add concrete ones. I'm happy to expand the constraints if we can come up with real future use-cases.

dlorenc avatar Mar 23 '21 17:03 dlorenc

I think that's fine - this linkage/field is actually the reverse direction. So each SBOM would reference one artifact via this link. Then the artifact has multiple SBOMs pointing back at it.

Are you expecting the proposed artifact manifest to solve the problem of top-down linking? I wonder then why one would need the "bottom-up" linking in this proposal.

nishakm avatar Mar 23 '21 18:03 nishakm

Are you expecting the proposed artifact manifest to solve the problem of top-down linking? I wonder then why one would need the "bottom-up" linking in this proposal.

I'm not sure I understand the question. This proposal is largely independent of the artifacts proposal. What's the "top-down" linking problem?

The concrete use-case I need this field for is to be able to upload an image, then to later upload a signature for that image. The signature will use this field to reference the image. That's not possible today without linking in this direction AFAIK.

dlorenc avatar Mar 23 '21 21:03 dlorenc

If #826 allows the server to build a descriptor with the content dereferenced into the data field, then that solves my issue.

This is exactly the intention of that proposal. It should complement this proposal very well.

That discussion aside, this is still appears to be at the wrong level. We're making a reference from a blob within manifest A to manifest B, rather than manifest A to manifest B.

Aren't we also doing that? See the diffs for manifest and image index.

And since a manifest has multiple blobs, including multiple child manifests in an index, or a config object and layers in an image manifest, there are lots of potential ways to link blobs within manifest A to manifest B. That link needs to be conveyed to the manifest A itself, or are we suggesting that another index is upload that includes a descriptor to A, and we include the reference on that index descriptor, rather than on one of the blobs within manifest A itself?

I'm not sure I understand what you're saying here, but I'll try to take a stab at answering it, because I have my own issues with the proposal that I think are related.

If a blob within manifest A has a reference to manifest B, I think the correct thing for a registry to do is include a descriptor pointing to manifest A when you ask it "what references manifest B?". I can see this being potentially confusing, though, because only part of manifest A references manifest B, and not the whole thing. While I like the ability to associate part of a manifest with something, perhaps this is enough of a footgun to just remove it from descriptor?

I'm not following the logic to artificially constrain future expansion of this based on the use cases we see today

I don't see where that's happening -- can you quote what you're replying to? (Not because I disagree, but because the thread is long and my attention span is short.)

By allowing reference to be on any descriptor, we're allowing arbitrary association between arbitrary things. Descriptors are often used for container-related stuff, but they're not limited to it. I see that as very powerful and not artificially constraining anything, but you might be referring to something else? I'm okay with multiple references as well, FWIW, but I'd like to work through the ramifications.

jonjohnsonjr avatar Mar 23 '21 21:03 jonjohnsonjr

That discussion aside, this is still appears to be at the wrong level. We're making a reference from a blob within manifest A to manifest B, rather than manifest A to manifest B.

Aren't we also doing that? See the diffs for manifest and image index.

You're right, I missed that part thinking it was just updating the existing digests, rather than adding a new field. Will a reference descriptor in these objects have no descriptor level media type, digest, or size, only the fields within the reference? e.g.

{
"config": ...,
"layers": [ ... ],
​"reference": {
  ​"​reference​"​: {
    ​"​mediaType​"​: ​"​application/vnd.docker.distribution.manifest.list.v2+json​"​,
    ​"​size​"​: ​1201​,
    ​"​digest​"​: ​"​sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b​"​
  }
 }
}

sudo-bmitch avatar Mar 23 '21 23:03 sudo-bmitch

I'm not sure I understand the question. This proposal is largely independent of the artifacts proposal. What's the "top-down" linking problem?

Per this proposal you want the ability to link Image Manifest, Image Index, and Descriptor to reference/link to each other. It seems to me what you are proposing may end up having an image manifest that looks like this:

{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "size": 1201,
  "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b",
  "layers": [
      ...
   ]
  "reference": {
     "mediaType": "application/vnd.example.signature+json,
     "size": 3514,
     "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
      "reference": {
        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
        "size": 1201,
        "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
      }
  }
}

Or bigger circular dependencies. Rather than do that, it would be easier to follow the regular merkle DAG format that the image manifest uses. You could start from the image index:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f"
    },
    {
      "mediaType": "application/vnd.example.signature+json,
      "size": 3514,
      "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
      "reference": {
        "mediaType": "application/vnd.oci.image.manifest.v1+json",
        "size": 1201,
        "digest": "sha256:b4f9e18267eb98998f6130342baacaeb9553f136142d40959a1b46d6401f0f2b"
      }
    }
  ],
}

This would mean restricting descriptors to the Image Index and Content Descriptors only.

Then again, I am not sure if this is how Content Descriptors are supposed to be used. Let me know if I misunderstood.

nishakm avatar Mar 24 '21 00:03 nishakm

It seems to me what you are proposing may end up having an image manifest that looks like this:

The example given isn't really a valid image manifest as they're used today -- are you just using the top-level mediaType, size, and digest fields as shorthand?

Or bigger circular dependencies

We can't have circular dependencies, which is a nice property of the DAG, but this helps me understand your objection, I think. This isn't what's being proposed.

As example, imaging we have two artifacts.

First, (linux/amd64) debian:

GET /v2/library/debian/manifests/sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b
...
HTTP/1.1 200 OK
Content-Length: 529
Content-Type: application/vnd.docker.distribution.manifest.v2+json
Date: Wed, 24 Mar 2021 16:14:20 GMT
Docker-Content-Digest: sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1463,
      "digest": "sha256:dc2eddc158255ea75b9774d29924a700e95d988bcb7612abbda29baddb291670"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 50400353,
         "digest": "sha256:e22122b926a1a853d61887fa35c3fe53e05ee7dc0f2f488936dc9838bd0e230d"
      }
   ]
}

Second, something that references debian, let's use an image manifest that contains a signature:

{
   "schemaVersion": 2,
   "config": {},
   "layers": [
      {
         "mediaType": "application/vnd.example.signature+json",
         "size": 7143,
         "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
         "reference": {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 529,
            "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
         }
      }
   ]
}

That layers[0].reference descriptor points to debian:

$ crane manifest debian | jq .manifests[0]
{
  "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b",
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "platform": {
    "architecture": "amd64",
    "os": "linux"
  },
  "size": 529
}

So we're saying that sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17 is a signature, and it references the image sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b (which happens to be the current linux/amd64 debian image).

Because there's only one signature here and one thing being referenced, this would be more or less equivalent to:

{
   "schemaVersion": 2,
   "config": {},
   "layers": [
      {
         "mediaType": "application/vnd.example.signature+json",
         "size": 7143,
         "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17"
      }
   ],
   "reference": {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "size": 529,
      "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
   }
}

Either way you represent this "signature image", the idea is that you can ask the registry what references debian and get back a useful answer:

GET /v2/library/debian/manifests/sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17/references

{
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 498,
      "digest": "sha256:712c33cae31ac010025eba0df101a866ed2d8171906c767a5094944500f0c609"
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "size": 1854,
      "digest": "sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac"
    }
  ]
}

That sha256:712c33cae31ac010025eba0df101a866ed2d8171906c767a5094944500f0c609 would be the "signature image", which contains a reference to the debian image. The sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac manifest list is the multi-platform image that references the linux/amd64 debian image.

Things can get more interesting with more complex examples, but this example is the core of the idea. I can associate one artifact with another using a novel kind of relationship (the "reference"), then ask the registry about relationships between artifacts.

It's very possible that someone has attached a signature to the manifest list rather than the specific linux/amd64 image. How can we know about that? Clients can walk the graph backwards by asking about references to the manifest list...

GET /v2/library/debian/manifests/sha256:9d4ab94af82b2567c272c7f47fa1204cd9b40914704213f1c257c44042f82aac/references

{
  "manifests": []
}

And discover that there is in fact no signature referencing the manifest list. This is really getting into the distribution side of things, which is not as relevant to this proposal; however, I think it's useful to see how these things would tie together.

jonjohnsonjr avatar Mar 24 '21 16:03 jonjohnsonjr

Second, something that references debian, let's use an image manifest that contains a signature:

This part is where I am confused. AIUI the image-spec says it's reserved for only image manifest compatible mediaTypes.

{
    "schemaVersion": 2,
    "mediaType": (?) 
    "config": {},
    "layers": [
       {
          "mediaType": "application/vnd.example.signature+json",
          "size": 7143,
          "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
          "reference": {
             "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
             "size": 529,
             "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
          }
       }
    ]
 }

I had asked about this a while ago when I was first learning about the spec and my understanding was that we couldn't use image manifests for other things because client tools may not handle this well (backwards compatibility thing). Perhaps this should be addressed first? I got the gist of what references can do. However, without some constraints I think you can end up with something like:

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json", <-- this is an image manifest
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1463,
      "digest": "sha256:dc2eddc158255ea75b9774d29924a700e95d988bcb7612abbda29baddb291670"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", <-- this is a layer for the image
         "size": 50400353,
         "digest": "sha256:e22122b926a1a853d61887fa35c3fe53e05ee7dc0f2f488936dc9838bd0e230d"
      },
      {
         "mediaType": "application/vnd.example.signature+json", <-- this is a layer for the signature of the image manifest
         "size": 7143,
         "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
         "reference": {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json", <-- this is a reference back to the top level image manifest
            "size": 529,
            "digest": "sha256:a4e852392000434b7c50b26dcf6a659a037521b17df69dd2ace5c2368efba38b"
      }
   ]
}

nishakm avatar Mar 24 '21 17:03 nishakm

I had asked about this a while ago when I was first learning about the spec and my understanding was that we couldn't use image manifests for other things because client tools may not handle this well (backwards compatibility thing).

I remember this concern being brought up, but I don't really share it, because things generally just work if you use manifests in this way.

If I remember correctly, @SteveLasker doesn't want someone to be able to use docker to run something that isn't a container image. Registries don't really care about the content of the blobs (and shouldn't). This is only a concern if clients are just pulling and running arbitrary things instead of being instructed to run specific things. Software that scan images could get confused as well, but that's a bug in their implementation (IMO) because clients should consider the mediaType of the content before assuming it's a changeset.

If you look at the spec for an image manifest, the language appears intentionally non-limiting to allow for future extension (emphasis mine):

Implementations MUST support at least the following media types: application/vnd.oci.image.layer.v1.tar application/vnd.oci.image.layer.v1.tar+gzip application/vnd.oci.image.layer.nondistributable.v1.tar application/vnd.oci.image.layer.nondistributable.v1.tar+gzip ... Manifests concerned with portability SHOULD use one of the above media types. An encountered mediaType that is unknown to the implementation MUST be ignored.

It doesn't forbid other mediaTypes, but certainly my example is a departure from most implementation expectations.

<-- this is a reference back to the top level image manifest

You cannot do this without cracking sha256, so I don't think it's a problem. If you tried to do this, you would create a new image that references the original version of itself, but you wouldn't create any kind of cycle, you'd just have two images with the second containing a lot of redundant information. The second image would contain everything from the first image, but it would also contain a signature pointing at the first image. This is fine, and in line with the proposal, but it's not necessary.

jonjohnsonjr avatar Mar 24 '21 18:03 jonjohnsonjr

I remember this concern being brought up, but I don't really share it, because things generally just work if you use manifests in this way.

+1, I think everybody agrees the naming is unfortunate and it would be great to change, but that's going to be a massive, multi-year effort from all the registries :(

dlorenc avatar Mar 24 '21 19:03 dlorenc

@dlorenc I would certainly appreciate including @jonjohnsonjr's example of how this would work in the spec, with the clarification on "Image Manifest" applications.

nishakm avatar Mar 24 '21 19:03 nishakm

@dlorenc I would certainly appreciate including @jonjohnsonjr's example of how this would work in the spec, with the clarification on "Image Manifest" applications.

Thanks for the feedback! We'll get this updated shortly.

dlorenc avatar Mar 27 '21 22:03 dlorenc