coreos-assembler formatting of "oscontainer" entries in the schema

When reviewing the changes in #3111 I started to look deeper at our oscontainer entries in our schema. The old style oscontainer looks something like this (this example happens to be from OpenShift/RHCOS 4.9):

"oscontainer": {                                                                        
    "digest": "sha256:b2e3b0ef40b7ad82b7e4107c1283baca71397b757d3e429ceefc6b1514e19848",
    "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev"                           
}

the new style oscontainer looks like this (source):

"base-oscontainer": {                                                                                              
    "image": "quay.io/fedora/fedora-coreos@sha256:08b4835b5701e87699a98a6a6102a92c2c404abca8d6716425a3cf1c4f4e0e90"
}

Was this by design? Do we want to continue with this difference or do we want to change something?

Oct 12 '22 17:10 dustymabe

Honestly I don't think it was intentional. Going back and looking at https://github.com/coreos/coreos-assembler/pull/2907 it seems very likely I did it that way just because it was the most convenient thing to do. The schema just accepted a string for "image" and so I stuffed it in there.

What isn't clear to me now is the consequences of trying to change it. It seems like we should certainly be consistent here. I think this may impact code in e.g. https://github.com/openshift/doozer/blob/3e755229ca4e00c4fd34e968a56518102ab32b6b/tests/test_rhcos.py#L105 but I'm not sure. We should run this by ART.

Oct 12 '22 18:10 cgwalters

I guess there are a few options here:

switch new style to have both digest and image fields
switch old style to just image (with digest embedded)
leave old style the old way and leave new style the new way

@cgwalters from your comment it looks like you were exploring option 2.?

Oct 12 '22 19:10 dustymabe

I don't have a strong opinion; ultimately the main consumer of this information is ART, so their call is probably most important. If I had to pick I'd say 1 just to be conservative and avoid risk for <= 4.11. But I don't know.

Oct 12 '22 21:10 cgwalters

I'd also vote for 1. I asked ART internally about consequences.

Oct 13 '22 17:10 jlebon

I was linked to https://github.com/openshift/doozer/blob/bb4085a515dedc0234d5ef3ca15fecf843c74875/doozerlib/rhcos.py#L80-L90, which would handle this gracefully, so I think we're good to do 1. As a courtesy, I'd suggest also filing a draft PR against openshift/doozer simplifying that code.

Oct 13 '22 19:10 jlebon

is doozer the only consumer of either of these? If that's the case it looks like doozer can already handle either (am I understanding correctly)?

If that's true we could go with whatever one we prefer.

Oct 13 '22 20:10 dustymabe

I believe doozer is the only consumer, yes.

Oct 13 '22 20:10 cgwalters

OK so actually there is another consideration here.

With the pipeline changes I'm working on, all the container images we push will be manifest lists to Quay.io. This brings back the discussion we had in https://github.com/coreos/fedora-coreos-pipeline/pull/515#discussion_r854225766 around storing the multi-arch digest vs the arch-specific one in meta.json.

Now, ART wants arch-specific digests in meta.json. I think we can do that, but that's not what cosa push-container-manifest does today. Does it make sense to make that configurable? I'd rather FCOS and RHCOS match behaviour here. Do we need to store the manifest list digest at all in meta.json?

Also for FCOS, I think we want the stream metadata to use tags there instead of digests. To get that though, the tag information needs to be in meta.json in the first place. This is missing today. We can't add a tag to the existing image field because that'll throw off code that currently assumes it is tagless.

So my proposal is to enhance the schema like this:

"base-oscontainer": {
    "digest": "sha256:b2e3b0ef40b7ad82b7e4107c1283baca71397b757d3e429ceefc6b1514e19848",
    "image": "quay.io/fedora/fedora-coreos",
    "tags": ["stable"],
}

where digest is the arch-specific digest for the given meta.json, and tags is the list of tags that were passed to cosa push-container-manifest (see second commit in https://github.com/coreos/coreos-assembler/pull/3129). That way, applications like doozer will keep working without modifications and the stream translator can use e.g. {image}:{tags[0]}.

Thoughts?

Oct 20 '22 21:10 jlebon

My opinion is:

Stop caring about this metadata for FCOS pipeline today - nothing reads it (right?), so we can just stop writing it. And I hope I've convinced everyone that "bespoke json metadata stored in s3 pointing to container images" has everything backwards.
Change the code to store arch-digested images only for RHCOS. Something like push-container --arch-digest or so

(But, this isn't really strongly held, if you prefer to have something more FCOS-y with tags, I'm OK with that too I think, I just really don't think it should be load-bearing in the future)

Oct 20 '22 21:10 cgwalters

My opinion is:

Stop caring about this metadata for FCOS pipeline today - nothing reads it (right?), so we can just stop writing it.

It's not in the stream metadata today, but it should be. I guess we could add sugar in the metadata translator to automatically assume that the stream name is a tag to the image, but it'd be much cleaner to be explicit and have the tag be in the metadata like everything else.

Change the code to store arch-digested images only for RHCOS. Something like push-container --arch-digest or so

If FCOS isn't going to use the manifest list digest at all (and I don't think it will), then I think it'd be cleaner to not differ at all.

Oct 20 '22 21:10 jlebon

OK so actually there is another consideration here.

With the pipeline changes I'm working on, all the container images we push will be manifest lists to Quay.io. This brings back the discussion we had in coreos/fedora-coreos-pipeline#515 (comment) around storing the multi-arch digest vs the arch-specific one in meta.json.

Now, ART wants arch-specific digests in meta.json. I think we can do that, but that's not what cosa push-container-manifest does today. Does it make sense to make that configurable? I'd rather FCOS and RHCOS match behaviour here. Do we need to store the manifest list digest at all in meta.json?

I think it's fine to store the arch specific digest in the arch specific meta.json files for FCOS too.

When I was writing the code originally it was just easier to grab and use the digest of the final pushed manifest listed container.

Also for FCOS, I think we want the stream metadata to use tags there instead of digests. To get that though, the tag information needs to be in meta.json in the first place. This is missing today. We can't add a tag to the existing image field because that'll throw off code that currently assumes it is tagless.

So my proposal is to enhance the schema like this:
"base-oscontainer": {
    "digest": "sha256:b2e3b0ef40b7ad82b7e4107c1283baca71397b757d3e429ceefc6b1514e19848",
    "image": "quay.io/fedora/fedora-coreos",
    "tags": ["stable"],
}
where digest is the arch-specific digest for the given meta.json, and tags is the list of tags that were passed to cosa push-container-manifest (see second commit in #3129). That way, applications like doozer will keep working without modifications and the stream translator can use e.g. {image}:{tags[0]}.

Not so sure about this part. I don't mind putting tags in the schema, but if the digest and repo are there then any consumer will be able to pull it, right?

Shouldn't the tag be the stream name?

Again, I'm good for putting this in the schema. I think where it gets confusing is when there is more than one tag. What do consumers do with that information?

Oct 20 '22 22:10 dustymabe

I think it's fine to store the arch specific digest in the arch specific meta.json files for FCOS too.

OK, sounds like we're all in agreement on this. Conceptually today, meta.json are arch specific anyways, so having the image digest be arch specific actually makes sense too.

Anyone who wants the container native flow can use the container-native way to find it - via a registry.

Oct 20 '22 23:10 cgwalters

OK, sounds like we're all in agreement on this. Conceptually today, meta.json are arch specific anyways, so having the image digest be arch specific actually makes sense too.

OK cool, I've updated https://github.com/coreos/coreos-assembler/pull/3129 to do this now!

Not so sure about this part. I don't mind putting tags in the schema, but if the digest and repo are there then any consumer will be able to pull it, right?

Yeah, but I think from previous discussions on this topic, we landed on the stream metadata referencing a tag and not a digest. So by default, FCOS users wouldn't actually see the digest. The main reason for adding a tag is so we can surface it in the stream metadata.

Shouldn't the tag be the stream name?

That's a convention set up by the pipeline. Do we do this kind of assumption anywhere in the stream metadata generator? ISTM like it'd be much clearer to have it be in the metadata proper. But also, it would only be true for the base oscontainer. E.g. the extensions and legacy containers have suffixes.

Again, I'm good for putting this in the schema. I think where it gets confusing is when there is more than one tag. What do consumers do with that information?

My reason for sticking all of them is that I think it's good practice to try to reference from meta.json all external resources that were pushed. E.g. usually tags prevent any GC from happening. A GC process could look at all the tags that were pushed and know which ones were unique to a build.

But anyway, we don't need that bit right now, so I don't want to block on it. I've split it into a separate PR: https://github.com/coreos/coreos-assembler/pull/3132

Oct 21 '22 13:10 jlebon

coreos-assembler coreos-assembler copied to clipboard

formatting of "oscontainer" entries in the schema

coreos-assembler
coreos-assembler copied to clipboard