nydus icon indicating copy to clipboard operation
nydus copied to clipboard

nydus.remoteimage.v1 in image index causes push to fail with ECR

Open Fricounet opened this issue 6 months ago • 7 comments

Hello folks, I was trying to fix #1691 when I stumbled on a similar issue for multi-arch images

Basically, when attempting to convert a multi-arch image, I can't push it to ECR and it seems that this part of the manifest is causing issue because when I remove it, it just works:

        "os.features": [
          "nydus.remoteimage.v1"
        ],

Reproducer:

nydusify convert --source nginx:latest --target my-ecr/my-image:nginx --platform "linux/amd64,linux/arm64"
INFO[2025-05-16T18:42:22+02:00] pulling image docker.io/library/nginx:latest  module=converter
INFO[2025-05-16T18:42:40+02:00] pulled image docker.io/library/nginx:latest , elapse 17.986182294s  module=converter
INFO[2025-05-16T18:42:40+02:00] converting image docker.io/library/nginx:latest  module=converter
INFO[2025-05-16T18:42:42+02:00] converted image my-ecr/my-image:nginx , elapse 2.482347474s  module=converter
INFO[2025-05-16T18:42:42+02:00] pushing image my-ecr/my-image:nginx  module=converter
FATA[2025-05-16T18:42:45+02:00] push image: failed commit on ref "index-sha256:74b970f44b893767940d6a292b5eacf981fd011eea4ce9e5dcfcb973859d8098": unexpected status from PUT request to https://my-ecr/v2/my-image/manifests/nginx: 405Method Not Allowed

Or, if you want to get the manifest manually:

nydusify convert --source nginx:latest --target localhost:5000/oci:latest --platform "linux/amd64,linux/arm64"
crane manifest localhost:5000/oci:latest | jq > nydus_manifest.json
aws ecr put-image --repository-name my-repository --image-tag nginx --image-manifest file://nydus_manifest.json
content of nydus_manifest.json { "schemaVersion": 2, "mediaType": "application/vnd.oci.image.index.v1+json", "manifests": [ { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:fd2da0d77212b6acca1a6abfdfe3dbae1c4bdc7e176a5caabc1c5aa41a7d569b", "size": 4443, "annotations": { "com.docker.official-images.bashbrew.arch": "amd64", "org.opencontainers.image.base.digest": "sha256:5accafaaf0f2c0a3ee5f2dcd9a5f2ef7ed3089fe4ac6a9fc9b1cf16396571322", "org.opencontainers.image.base.name": "debian:bookworm-slim", "org.opencontainers.image.created": "2025-04-28T21:42:34Z", "org.opencontainers.image.revision": "eaf8875a1967d24cea6ed8b37109075e39ed9e43", "org.opencontainers.image.source": "https://github.com/nginx/docker-nginx.git#eaf8875a1967d24cea6ed8b37109075e39ed9e43:mainline/debian", "org.opencontainers.image.url": "https://hub.docker.com/_/nginx", "org.opencontainers.image.version": "1.27.5" }, "platform": { "architecture": "amd64", "os": "linux", "os.features": [ "nydus.remoteimage.v1" ] } }, { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:79b251ad8717a8fd7c70ab46b0f6f4369da385549078578513c46da5062ac626", "size": 4447, "annotations": { "com.docker.official-images.bashbrew.arch": "arm64v8", "org.opencontainers.image.base.digest": "sha256:f098d2b31ad6f5cbfa8fc974eb69f9f18dcff506c3984eae54deb8b89731bc00", "org.opencontainers.image.base.name": "debian:bookworm-slim", "org.opencontainers.image.created": "2025-04-28T22:38:21Z", "org.opencontainers.image.revision": "eaf8875a1967d24cea6ed8b37109075e39ed9e43", "org.opencontainers.image.source": "https://github.com/nginx/docker-nginx.git#eaf8875a1967d24cea6ed8b37109075e39ed9e43:mainline/debian", "org.opencontainers.image.url": "https://hub.docker.com/_/nginx", "org.opencontainers.image.version": "1.27.5" }, "platform": { "architecture": "arm64", "os": "linux", "os.features": [ "nydus.remoteimage.v1" ], "variant": "v8" } } ] }

When looking a bit, I found that this field was introduced in this PR. I'm wondering if that is really needed for index manifests and if it can be omitted?

Fricounet avatar May 16 '25 16:05 Fricounet

Hi @Fricounet, the nydus.remoteimage.v1 field is used differentiate to OCI v1 and nydus image, the reason for merging OCI v1 and nydus manifests into a single manifest index is historical, the feature requires support from image runtimes like containerd or podman. In practice, this field is currently unused, to ensure compatibility with ECR, I think it can be removed.

imeoer avatar May 19 '25 02:05 imeoer

The related code: https://github.com/containerd/nydus-snapshotter/blob/88a72d945ce529053463e1095651402bd4b3822b/pkg/converter/convert_unix.go#L952 (nydusify imported the nydus-snapshotter/pkg/converter package).

imeoer avatar May 19 '25 02:05 imeoer

@imeoer thanks for the answer, I'll work on this then too.

the feature requires support from image runtimes like containerd or podman

If I understand correctly, the goal would be to have containerd automatically chose the nydus image inside the index if the it has the nydus snapshotter enabled right? Would be quite neat indeed.

I just wonder, is it even possible currently to build a manifest like the one you linked? Using nydusify or buildkit, I don't quite see how you can end up with both in the same manifest? (as nydusify creates a new manifest in --target but with only the nydus image and buildkit can only have 1 output specified, so only nydus or targz for instance)

Fricounet avatar May 19 '25 07:05 Fricounet

@Fricounet Try the nydusify convert --merge-platform --source $OCI_IMAGE --target $NYDUS_IMAGE command. :)

imeoer avatar May 19 '25 07:05 imeoer

Ah I see ! Thank you !

Fricounet avatar May 19 '25 07:05 Fricounet

@Fricounet I think we should enable the nydus.remoteimage.v1 flag only for --merge-platform, instead of index conversion.

imeoer avatar May 19 '25 07:05 imeoer

Yes! That was my thoughts too. I'll work on doing this

Fricounet avatar May 19 '25 07:05 Fricounet

This issue is stale because it has been open 60 days with no activity.

github-actions[bot] avatar Jul 26 '25 00:07 github-actions[bot]

Hey @imeoer I wanted to get your opinion on something you mentioned higher in the conversation. Namely that the os.features: nydus.remoteimage.v1 is basically useless for now because it need the container runtime to implement something to understand this. I've seen in the soci snapshotter manifest that they do a similar thing as your --merge-platform where they put both the regular OCI image and the soci one in the same index manifest:

crane manifest localhost:5000/test:soci | jq
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:04888253b77b5aff33606bb1338ef1e05661e94dd9b3bc8d5f2ba61a7831461f",
      "size": 2015,
      "annotations": {
        "com.amazon.soci.index-digest": "sha256:c1df1fa989c00fa503613dcd0f19a24a2a2fbbf67d33b94cd035a2df43d6e286",
        "target": "prod"
      },
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:c1df1fa989c00fa503613dcd0f19a24a2a2fbbf67d33b94cd035a2df43d6e286",
      "size": 2110,
      "annotations": {
        "com.amazon.soci.image-manifest-digest": "sha256:04888253b77b5aff33606bb1338ef1e05661e94dd9b3bc8d5f2ba61a7831461f"
      },
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "artifactType": "application/vnd.amazon.soci.index.v2+json"
    }
  ]
}

and then they use the annotations present in the manifest to be able to find the soci image and leverage lazy pulling this way. Have you considered this approach? I think that's an interesting approach because this way you can have a single manifest with both images and still be able to pull efficiently if the local snapshotter supports nydus while still being able to pull the image normally on devices which don't have nydus installed (since containerd will just pull the regular image instead). All of this without needing to implement anything in the container runtime. What's do you think?

Fricounet avatar Aug 01 '25 15:08 Fricounet

Hey @imeoer @BraveY, regarding my last comment, I did implement the feature in my fork to auto-detect the nydus manifest in the image index. Would you like me to submit a PR for it? https://github.com/DataDog/nydus-snapshotter/pull/3

Fricounet avatar Aug 27 '25 15:08 Fricounet

@Fricounet Thanks, the PR looks good to me, feel free to submit it upstream, maybe we can only add some usage doc. :)

imeoer avatar Aug 28 '25 03:08 imeoer

@Fricounet If a node has the nydus snapshotter installed and configured, it will run the nydus image, otherwise it will run the OCI v1 image, is that correct?

imeoer avatar Aug 28 '25 03:08 imeoer

@imeoer

If a node has the nydus snapshotter installed and configured, it will run the nydus image, otherwise it will run the OCI v1 image, is that correct?

Yes exactly, that's based on the fact that the OCI images comes first in the list so containerd will pick it up first. When the image goes through a regular snapshotter, the OCI image is handled as usual but when it goes through the nydus-snapshotter, the snapshotter will discover the other image in the manifest and prepare it instead. Just like what the current referrer_detect feature does.

feel free to submit it upstream, maybe we can only add some usage doc. :)

Will do with some usage docs added

However, I just realized that the initial reason for this issue (ecr not supporting the os.features: nydus.remoteimage.v1 field and rejecting push events) is still a thing. Which makes this feature impossible to use there 🤦 I'll try to open an issue with them for this. Unless you have some ideas on how we could go around this limitation?

Fricounet avatar Aug 28 '25 07:08 Fricounet

@Fricounet How about making it compatible like:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:4e4bc990609ed865e07afc8427c30ffdddca5153fd4e82c20d8f0783a291e241",
      "size": 943,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:db30bb2870067ed3e0e73c7448d9f0b529169da8295b5b5155b417624d861d81",
      "size": 1367,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "artifactType": "application/vnd.nydus.image.manifest.v1+json"
    }
  ]
}

Note: the newly added "artifactType": "application/vnd.nydus.image.manifest.v1+json" field.

imeoer avatar Aug 28 '25 07:08 imeoer

@imeoer yes I think that would work. That's actually what soci is doing on their end too and from my tests, this manifest does seem to pass ECR validation

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:a63dfddecc661e4ad05896418b2f3774022269c3bf5b7e01baaa6e851a3a4a23",
      "size": 2320,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:cb559143b6929cc3fbaf0dfb0e5b166ac5c7f3421fe71c4ded2499b38512e4ce",
      "size": 2320,
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:7fca51156823f77b71e9d5e78c69aac7cd2d3154c6be8d6101bb4483ea383184",
      "size": 3251,
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "artifactType": "application/vnd.nydus.image.manifest.v1+json"
    },
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:dd72e2d01ac83c559243ef1ec8c1d96b912e66a8a3dbafacb608a7065c886475",
      "size": 3251,
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      },
      "artifactType": "application/vnd.nydus.image.manifest.v1+json"
    }
  ]
}

This would require a fix in the acceleration-service I believe since the converter package is located there. And then we'd need to bump the package there but we'll encounter the same issue I had trying to bump the snapshotter in #1699. We really need to try to bump nydusify for containerdv2 to support this then.

How would you like to see this implemented?

  • Do you prefer to completely get rid of the os.features: nydus.remoteimage.v1 logic in the acceleration service and add the artefactType instead?
  • or add a separate flag chain in nydusify and the acceleration service to add this artifactType and keep the existing os.features: nydus.remoteimage.v1 logic as it is right now

Fricounet avatar Aug 28 '25 09:08 Fricounet

@Fricounet Currently no runtime (snapshotter) supports os.features: nydus.remoteimage.v1 except for build tools (e.g., nydusify), so I suggest changing it to artifactType.

imeoer avatar Aug 28 '25 09:08 imeoer

Sounds good, I can do this

Fricounet avatar Aug 28 '25 09:08 Fricounet