image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

index.json: make org.opencontainers.image.ref.name unique?

Open cyphar opened this issue 8 years ago • 46 comments

While trying to implement index.json parsing in umoci I've come to a bit of a worrying realisation. There isn't really any explanation of how cases where org.opencontainers.ref.name is not unique should be handled. While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too), it doesn't really make sense to me in the general case to allow org.opencontainers.ref.names to be duplicated in index.json.

So here's two things that would be nice if we could make org.opencontainers.ref.names unique. This includes the multiple-platform case I mention above because it starts to get a bit hairy -- and honestly you should just be using application/vnd.oci.image.index.v1+json for that.

Sorry to stir this pot again, I've only just hit this issue trying to fix up the oci/cas implementation in umoci.

#533

cyphar avatar Feb 23 '17 07:02 cyphar

While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too).

Do we have platform in application/vnd.oci.image.index.v1+json itself? I think we only have it in application/vnd.oci.image.manifest.v1+json descriptor.

qianzhangxa avatar Feb 23 '17 08:02 qianzhangxa

On Wed, Feb 22, 2017 at 11:36:15PM -0800, Aleksa Sarai wrote:

While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too), it doesn't really make sense to me in the general case to allow org.opencontainers.ref.names to be duplicated in index.json.

In #533, @stevvooe and @vbatts explicitly discussed multiple entries with one name but different platform information 1, so you can have:

index.json → platform-appropriate-manifest

With unique names, you'd need:

index.json → multi-platform-index → platform-appropriate-manifest

Of course, without clear platform matching (or other metadata-based matching) to break ties, things get a bit awkward. Consumers putting a new platform-agnostic descriptor should have options for “clobber any pre-existing, platform-agnostic refs” or “add a new platform-agnostic ref, even if there is a pre-existing, platform-agnostic ref”. Consumers getting a descriptor can error out with “you asked for $NAME, but there are multiple descriptors matching that name with equivalently appropriate platform information”.

To @qianzhangxa's question, here's platform in the index JSON 2, so you can declare a platform for everything your reference from the index.

wking avatar Feb 23 '17 18:02 wking

There should be no validation of annotation data.

Either way, the platform dispatch case is why you would want manifests with different names.

@cyphar I may be misunderstanding something. Could you provide an example of the data structures you think should be invalid? application/vnd.oci.image.manifest.v1+json should not have multiple manifests.

stevvooe avatar Feb 23 '17 18:02 stevvooe

@stevvooe Can you explain how a tool like umoci should handle the following index.json?

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
      "annotations": {
        "org.opencontainers.ref.name": "some-tag"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 8119,
      "digest": "sha256:b3d63d132d21c3ff4c35a061adf23cf43da8ae054247e32faa95494d904a007e",
      "annotations": {
        "org.opencontainers.ref.name": "some-tag"
      }
    }
  ]
}

When a user asks to do something with the some-tag tag?

Even if you say "well, that's not defined by the spec" then how is something like this meant to be handled -- which is something I'm quite concerned about because it starts combining the index.json layer with the ManifestList (now ImageIndex) layer -- combining tagging and multi-platform manifests:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 7143,
      "digest": "sha256:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd",
      "annotations": {
        "org.opencontainers.ref.name": "stable-release"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 8119,
      "digest": "sha256:b3d63d132d21c3ff4c35a061adf23cf43da8ae054247e32faa95494d904a007e",
      "platform": {
        "architecture": "ppc64le",
        "os": "linux"
      },
      "annotations": {
        "org.opencontainers.ref.name": "stable-release"
      }
    }
  ]
}

And even if you have multiple tags that are all unique platforms (once you parse the tree) -- you're now violating some of the layer separation -- how can you sanely separate the reference evaluation from the ManifestList/Manifest parsing?

cyphar avatar Feb 23 '17 23:02 cyphar

What about we just remove image index from image layout and only have manifest descriptors in index.json? I think the main use case of image index is multi-platform support, but essentially index.json is an image index which already gives us the ability to support multi-platform, so I do not think two-levels image index is necessary. And then I think in an index.json, each org.opencontainers.ref.name + platform must be unique, if there is a manifest descriptor has no platform, it can be just ignored.

qianzhangxa avatar Feb 24 '17 05:02 qianzhangxa

@qianzhangxa Two levels of indirection give you many benefits -- the most obvious of which is that you can have multiple tags that reference the same multi-architecture image. And the multi-architecture image is part of the CAS, which gives you a lot of other benefits.

In my opinion, index.json should be as dumb of a store as is physically possible. While I understand the need for ManifestDescriptors in ManifestList, it doesn't make sense for index.json -- there's no one clear way for how multi-platform images should be implemented.

Not to mention that you've added all of this ambiguity and complicated logic to the reference resolver -- which doesn't make sense from my point of view. It seems to me to be quite a rampant layer violation, and from the umoci side it will make the code much more ugly and frustrating to implement (not to mention that the oci/cas interface and implementation will get pretty ugly too).

The current scheme means that GetReference and PutReference now need to pass a bunch of other information to the CAS store than just the "tag name".

cyphar avatar Feb 24 '17 08:02 cyphar

@cyphar Can you please elaborate a bit more about "multiple tags that reference the same multi-architecture image."? Did you mean the case that two tags (e.g., latest and v2.0) reference the same image index?

qianzhangxa avatar Feb 24 '17 14:02 qianzhangxa

@qianzhangxa Sorry, I misspoke. What I meant was (and if you look at my example you'll see what I mean) was

the most obvious of which is that you can have multiple identical tags with the same name that reference different multi-architecture images.

Basically, the problem with the current state of org.opencontainers.ref.name is that there is no single unambiguous way of deciding what manifest is being referenced by a given tag. Which means that as it stands, tagging is effectively broken.

cyphar avatar Feb 25 '17 00:02 cyphar

@cyphar In general, the proposal to validate the contents of annotation field is a non-starter. Annotations are functionally meaningless from the point of the view of the specification. It is up to the consumer decide how to present the data. If the tool doesn't reflect the properties of the data structure, then the tool is broken.

umoci can make its own determination about how to handle this case. One approach is to linearize the descriptors that match some-tag with an in-order traversal and then run the standard platform matching algorithm on that ordered set. On duplicates, you take the first entry that matches. Another approach would be to error out on duplicates.

You'll actually find that the algorithm proposed above is both sufficient and deterministic in resolving multi-platform images, even cases in where duplicate or ambiguous tags are present.

stevvooe avatar Feb 25 '17 01:02 stevvooe

@stevvooe But we do define what are valid values for annotations. Annotations in the org.opencontainers.* namespace are reserved by us and we have the right to specify how they should be used. Nothing is stopping a third party from making their own tagging system with annotations.

One approach is to linearize the descriptors that match some-tag with an in-order traversal and then run the standard platform matching algorithm on that ordered set.

What is the "standard platform matching algorithm" -- and how am I meant to run said algorithm if umoci is not running on the platform which I'm extracting (do I have to copy the entire contents of /proc/cpuinfo in order for it to be made clear)? The current scheme combines two very separate components of CAS operations -- reference resolution and blob/manifest parsing.

But all of this is besides the point -- why are we allowing multiple tags in this instance when you can just create a single tag which points to an ImageIndex that then shards up the multi-platform stuff. Is there some benefit I'm missing, because the current system strikes me to be significantly less elegant than if we'd just taken the refs directory and mapped it directly to json (as opposed to overloading the use of ManifestList to become ImageIndex).

You'll actually find that the algorithm proposed above is both sufficient and deterministic in resolving multi-platform images, even cases in where duplicate or ambiguous tags are present.

It might be both sufficient and deterministic, but that doesn't make it sensible when you could just drop this ambiguity in the first place.

Also, the algorithm you propose is not defined in the spec. So how is a user meant to know how their image will be understood by an image-spec implementation?

cyphar avatar Feb 25 '17 03:02 cyphar

And to make the point more clear, the current CAS interface for umoci is this:

// Engine is an interface that provides methods for accessing and modifying an
// OCI image, namely allowing access to reference descriptors and blobs.
type Engine interface {
	// PutReference adds a new reference descriptor blob to the image. This is
	// idempotent; a nil error means that "the descriptor is stored at NAME"
	// without implying "because of this PutReference() call". ErrClobber is
	// returned if there is already a descriptor stored at NAME, but does not
	// match the descriptor requested to be stored.
	PutReference(ctx context.Context, name string, descriptor ispec.Descriptor) (err error)

	// GetReference returns a reference from the image. Returns os.ErrNotExist
	// if the name was not found.
	GetReference(ctx context.Context, name string) (descriptor ispec.Descriptor, err error)

	// DeleteReference removes a reference from the image. This is idempotent;
	// a nil error means "the content is not in the store" without implying
	// "because of this DeleteReference() call".
	DeleteReference(ctx context.Context, name string) (err error)

	// ListReferences returns the set of reference names stored in the image.
	ListReferences(ctx context.Context) (names []string, err error)

	// ... other methods omitted for brevity ...

How do you propose that I change the ListReferences, PutReference, DeleteReference and GetReference interfaces to facilitate this ambiguity and platform matching code? I have no problem with switching away from Descriptors to ManifestDescriptors (though I also don't agree with the need for that change either, but let's just side-step that issue for now), but the required changes to now pass around some opaque concept of "what platform I'm using" doesn't make sense to me.

cyphar avatar Feb 25 '17 03:02 cyphar

the most obvious of which is that you can have multiple identical tags with the same name that reference different multi-architecture images.

@cyphar But why is this a benefit? Won't cause ambiguity? E.g., if there are two same tags in the index.json reference two different image index, then for the runtime to pull the image based on that tag, which one should it pull?

qianzhangxa avatar Feb 25 '17 13:02 qianzhangxa

@qianzhangxa It isn't a benefit, my point is that the current system allows this and it shouldn't. That's what this issue is about, removing this ambiguity.

cyphar avatar Feb 26 '17 05:02 cyphar

@cyphar I agree. I guess I misunderstood your previous comment.

Two levels of indirection give you many benefits -- the most obvious of which is that you can have multiple tags that reference the same multi-architecture image.

qianzhangxa avatar Feb 26 '17 12:02 qianzhangxa

@qianzhangxa I misspoke in that comment, and I corrected it here. Sorry for any confusion.

cyphar avatar Feb 27 '17 09:02 cyphar

@cyphar

How do you propose that I change the ListReferences, PutReference, DeleteReference and GetReference interfaces to facilitate this ambiguity and platform matching code?

From what I said:

umoci can make its own determination about how to handle this case.

In general, it looks like the interface doesn't represent the underlying data structure. For your interface, it expects unique references. Upon encountering duplicates, it should error out. If you want it to be more flexible, return multiple descriptors for each name and don't error out. This is your choice as an application developer.

What is the "standard platform matching algorithm"[?]

It is somewhat implied by the given fields. Look at the platform. If you can process it or run it, do so. They are properties of the targeted thing that try to provide meaning for that thing without traversing further.

To be honest, I find this so straightforward that I am not understanding the confusion.

the required changes to now pass around some opaque concept of "what platform I'm using" doesn't make sense to me.

Where are you getting this requirement from? Collect the references and re-process them. Or, use the visitor pattern and pass in a function that does what you need.

But we do define what are valid values for annotations. Annotations in the org.opencontainers.* namespace are reserved by us and we have the right to specify how they should be used. Nothing is stopping a third party from making their own tagging system with annotations.

Avoid confounding the definition the contents of annotations vs the relationships of annotations. The relationships for the fields of a given type belong with the type, not field of a type, thus annotations' relationships cannot be defined by the annotation itself without creating serious issues.

It might be both sufficient and deterministic, but that doesn't make it sensible when you could just drop this ambiguity in the first place.

I think the problem here is that you have constructed an interface based on your expectation of how the data is structured. Now that you have studied the data structure, that interface seems to be insufficient (or, perhaps, may impose requirements). In general, I think this approach to a monolithic tag style interface is broken. You need to approach this more like a tree of references that you visit, collect and process.

I have no problem with switching away from Descriptors to ManifestDescriptors

This type split is indeed unfortunate. There were never meant to be two descriptor types (it would be like having 16-bit and 64-bit pointers in the same hardware) and complicates things. We could move the platform description to annotations to address this.

stevvooe avatar Feb 27 '17 22:02 stevvooe

@stevvooe

Where are you getting this requirement from? Collect the references and re-process them. Or, use the visitor pattern and pass in a function that does what you need.

My problem with this is that now you're effectively punting reference resolution to the caller of the CAS, because of this ambiguity and mixing of different restrictions of whether it is valid to return a particular descriptor (such as the platform). Which all means now that different implementations will have incompatible implementations of reference resolution and the original "feature" of having this free-for-all is not going to work out.

Specifications should not intentionally be introducing ambiguity like this, because it makes implementing something to match the specification difficult. "The algorithm is trivial" doesn't really justify the fact that currently there is no unambiguous standard-compliant way of parsing org.opencontainers.ref.name (without just asking the user which entry in the manifests array they want).

In general, I think this approach to a monolithic tag style interface is broken. You need to approach this more like a tree of references that you visit, collect and process.

As above, this effectively means that it is simply not possible for a user of a generic OCI library to be able to know what org.opencontainers.ref.name annotations will do or how they will be parsed. They have to implement all of the reference resolution themselves, because the library can't know what platform or whatever other decisions the caller wants to make.

So now a user has to go find out how skopeo, umoci, docker and whatever else implement the handling of references. They don't have any guarantees from the spec how the entrypoint (tag) to the image manifests is going to be parsed, and as a developer they have no information on how cases like the ones I've outlined above should be implemented either.

cyphar avatar Feb 28 '17 13:02 cyphar

I agree with @cyphar, currently I am working on OCI image support in Mesos, I need to parse and pull OCI image, basically I am not sure what is the best/standard way to handle the case that there are duplicated org.opencontainers.ref.name in index.json, I think we need to make it clear in the spec.

qianzhangxa avatar Feb 28 '17 15:02 qianzhangxa

@cyphar Calling this "punting" or "incompatible" is absolutely ridiculous. You are making up issues where there are none. It is not a free-for-all and the incompatibilities you are speaking to are based on assumption over fact. As I have said, the ability to have the same tag pointing to multiple things is a feature, not a bug.

Please avoid the histrionics and actually try the approach I suggested.

There are literally two choices here:

  1. Either return multiple descriptors.
  2. Error out on duplicates.

Have a little imagination and figure it out.

stevvooe avatar Feb 28 '17 21:02 stevvooe

On Tue, Feb 28, 2017 at 01:35:10PM -0800, Stephen Day wrote:

There are literally two choices here:

  1. Either return multiple descriptors.
  2. Error out on duplicates.

Have a little imagination and figure it out.

While this is fine for an API, @cyphar made a good point about compat issues 1 that you are not addressing here. Say @cyphar's imagination takes him down (1) for his index API and umoci, but the Docker devs' imagination takes them down (2). Now an innocent user builds an image with umoci that has multiple descriptors with the same org.opencontainers.ref.name and equivalent platform information in their index JSON. They feed that OCI-compliant image into Docker, and the OCI-compliant Docker ingestor chokes and dies on the repeated name. Who's fault is the breakage? The spec is not clear.

If multiple descriptors with the same name/platform are allowed, I think the spec should either:

a. Unambiguously require some MUST level support for them. b. SHOULD users away from using them, on the grounds that OCI-compliant handlers are not required to support them.

My current feeling is that (1) and (b) are the the conservative courses, but that isn't a healthy ecosystem for promoting the “same tag pointing to multiple things” feature. If you want this feature to be portable (and I don't have an opinion on that myself), I'd recommend the spec do something along the lines of (a). If you don't want the feature to be portable, then dropping the feature (like #582) makes the most sense to me.

wking avatar Feb 28 '17 22:02 wking

How is this even a concern of the spec? If ref name was important to the spec it wouldn't be in annotations. I thought across the board, annotations are opaque to the specs( runtime and image ) so why would you even encode anything in there for general consumption?

Whats the point in having a type safe spec and scheme if you are just going to add things in a generic object?

crosbymichael avatar Feb 28 '17 22:02 crosbymichael

@stevvooe I still don't see why this dereference walk:

ImageIndex[with same tag pointing to multiple manifests] -> Manifest -> ...

Is a good feature when this would be possible if refs.name was unqiue (and was possible pre-index.json):

ImageIndex[unique tag pointing to index] -> ImageIndex -> Manifest -> ...

Maybe you can explain to me what feature you get out of the first walk that you don't get in the second one? I'm sure there was a good reason for this, but

Please avoid the histrionics and actually try the approach I suggested.

I already have a branch in umoci (not pushed yet) with the approach you suggested. I have tried it. I don't agree with it, and that's why I'm trying to have a meaningful discussion on the issue.

Sure, I can implement whatever reference resolution algorithm I like and just force users to deal with it. But without also reading how skopeo does it, how docker is going to do it, and so on then I will be creating an incompatible UX. So we're going to have to come to some agreement between implementations anyway.

To be clear tools that I've seen (skopeo, umoci and the current state of docker/docker#26369) don't implement platform handling at all (effectively ImageManifest was something that we didn't touch) -- because we (@runc0m, myself and others) weren't entirely sure how things are meant to work in cases where you want to operate on an image that is different from the platform you're running on.

Now this concern we had is now extended into reference resolution, with the additional kick that now reference resolution also has to implement some form of platform handling in order for things to be done automatically -- or we have to (as you suggested) make some parts of the UX require the user to clarify what tag they want.

So again, the issue I have with your approach is that umoci (and the other tools I've mentioned) have to now either decide to just not implement this edge-case of the spec (which makes them non-compliant) or they have to come up with an out-of-spec way of requesting the user to clarify what tag they really meant.

It is not a free-for-all

I don't know what your definition of a "free-for-all" is, but in my book the following line from the spec indicates to me that it is indeed a free-for-all:

No semantic restriction is given for the "org.opencontainers.ref.name" annotation of descriptors.

cyphar avatar Mar 01 '17 02:03 cyphar

@crosbymichael

How is this even a concern of the spec? If ref name was important to the spec it wouldn't be in annotations.

While that's all well and good, users need to be able to reference things inside an OCI image. It's just silly to say that users are on their own if they want to be able to talk about what thing inside an OCI image they are referring to.

Whats the point in having a type safe spec and scheme if you are just going to add things in a generic object?

We have restrictions on other annotations (since they're JSON strings but the content is meant to be a different type). I'm not sure I understand the argument that we are not allowed to validate or otherwise touch annotations that we define in the spec.

cyphar avatar Mar 01 '17 02:03 cyphar

https://github.com/opencontainers/image-spec/blob/master/annotations.md

That makes it clear to me that they are arbitrary fields and values and its up to the consumers to handle the keys/values appropriately.

If there is a possibility that users can have duplicate values then just have the consuming code handle that and don't make a naive assumption that the values of the field will be unique. It only states that keys are unique. Starting to add all these rules around "arbitrary" values is a bad position to be in.

crosbymichael avatar Mar 01 '17 18:03 crosbymichael

Maybe this seems like a dumb idea and I hope it's ok if I jump in, but what's the reason that the name couldn't be made first-class?

On Wed, Mar 1, 2017 at 10:56 AM, Michael Crosby [email protected] wrote:

https://github.com/opencontainers/image-spec/blob/master/annotations.md

That makes it clear to me that they are arbitrary fields and values and its up to the consumers to handle the keys/values appropriately.

If there is a possibility that users can have duplicate values then just have the consuming code handle that and don't make a naive assumption that the values of the field will be unique. It only states that keys are unique. Starting to add all these rules around "arbitrary" values is a bad position to be in.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/581#issuecomment-283434330, or mute the thread https://github.com/notifications/unsubscribe-auth/AABJ66f5vwhx3MS0HQVGsV9UXKq_7b6Lks5rhb9xgaJpZM4MJo3y .

erikh avatar Mar 01 '17 22:03 erikh

Maybe this seems like a dumb idea and I hope it's ok if I jump in, but what's the reason that the name couldn't be made first-class?

This was a mistake made in Docker schema1 and we'd like to avoid repeating history.

This really isn't as complicated as @cyphar is making it out to be. The problem is that his code doesn't match the properties of the data structure. This problem is going to come up whether or not we make the ref.name unique. The correct solution is to make our programs model the problem appropriately.

stevvooe avatar Mar 01 '17 22:03 stevvooe

Per discussion on the OCI call, we are going to follow this up with further implementation notes detailing the degrees of freedom in the datastructure and implications of UX design on compatibility between tools.

@cyphar Please open an issue, assigning it to me, for what you are looking for and we'll close this when we have agreed on the body of work.

stevvooe avatar Mar 01 '17 23:03 stevvooe

Partially related, as per -rc5 (after https://github.com/opencontainers/image-spec/pull/561) the following are all valid refs:

  • "" (empty string in the value field, also different from a nil value or missing ref.name annotation)
  • ../
  • <marquee>
  • $(foo)
  • ' --
  • \n
  • :
  • an arbitrarily-long string

If an "implementation notes" doc is in the work, this may get some words covering the processing and compatibility side of using such things as an index.

lucab avatar Mar 07 '17 16:03 lucab

As of rc6, the correct annotation key is "org.opencontainers.image.ref.name"! https://github.com/opencontainers/image-spec/blob/v1.0.0-rc6/image-layout.md Anyone please fix the issue title so that people won't copy-paste by mistake? :sweat_smile:

AkihiroSuda avatar May 26 '17 10:05 AkihiroSuda

On this issue, I can not think of reason why one would want to have duplicate ref.name's, but I am no where close to convinced that they should be strictly unique.

Some of @lucab concerns have been addressed in https://github.com/opencontainers/image-spec/pull/671

I'm inclined to close this issue for now.

vbatts avatar May 26 '17 13:05 vbatts