serving icon indicating copy to clipboard operation
serving copied to clipboard

Tag-to-digest resolution fails for private registries despite imagePullSecrets being configured

Open itay-nvn-nv opened this issue 3 weeks ago • 6 comments

/area networking

What version of Knative?

Knative Serving v1.17, v1.18, v1.19, v1.20.0

Also confirmed on main branch: knative-v1.20.0-52-gabbe514be

Expected Behavior

When creating a Knative Service with a private Docker Hub image using a tag (e.g., docker.io/username/private-image:latest) and properly configured imagePullSecrets in the Service spec with valid credentials in a kubernetes.io/dockerconfigjson secret, the Revision controller should:

  • Use the imagePullSecrets to authenticate during tag-to-digest resolution
  • Successfully resolve the tag to a digest
  • Create the pod and pull the image

Actual Behavior

The Revision fails with a 401 Unauthorized error during tag-to-digest resolution:

Unable to fetch image "docker.io/username/private-image:v0.1": 
failed to resolve image to digest: 
HEAD https://index.docker.io/v2/username/private-image/manifests/v0.1: 
unexpected status code 401 Unauthorized (HEAD responses have no body, use GET for details)

The imagePullSecrets are not being used for authentication during the HTTP HEAD request. The pod is never created because digest resolution fails before pod creation.

Key observations:

  • The secret is correctly formatted (kubernetes.io/dockerconfigjson)
  • The secret is properly referenced in spec.template.spec.imagePullSecrets
  • The same credentials work for standard Kubernetes Pods (Deployments, StatefulSets)
  • Using image digest format directly (image@sha256:...) works as a workaround

Attempted workarounds:

  1. Adding docker.io to registries-skipping-tag-resolving in config-deployment ConfigMap - still fails with 401
  2. Using image digests directly - ✅ works but breaks CI/CD pipelines

Steps to Reproduce the Problem

1. Create a Docker Hub private registry secret:

kubectl create secret docker-registry my-docker-secret \
  --docker-server=docker.io \
  --docker-username=<username> \
  --docker-password=<password> \
  -n testing-zone

2. Create a Knative Service with imagePullSecrets:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: test-private-image
  namespace: testing-zone
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "1"
        autoscaling.knative.dev/max-scale: "1"
    spec:
      imagePullSecrets:
      - name: my-docker-secret
      containers:
      - image: docker.io/username/private-repo:latest
        ports:
        - containerPort: 8080

3. Observe the failure:

kubectl get revision -n testing-zone
# Output: READY=False, REASON=ContainerMissing

kubectl get revision <revision-name> -n testing-zone -o jsonpath='{.status.conditions[?(@.type=="ContainerHealthy")].message}'
# Output: 401 Unauthorized error

4. Verify credentials work with standard Kubernetes Pod:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: testing-zone
spec:
  imagePullSecrets:
  - name: my-docker-secret
  containers:
  - name: test
    image: docker.io/username/private-repo:latest

Result: The standard Pod successfully pulls the image using the same credentials.

Root Cause

The issue appears to be in pkg/reconciler/revision/resolve.go:116 where remote.Head() is called with remote.WithAuthFromKeychain(kc). Despite imagePullSecrets being passed to k8schain.Options in revision.go:90-99, the authentication fails during the HTTP HEAD request to Docker Hub.

The registries-skipping-tag-resolving workaround (originally added in PR #1390 for local dev registries) doesn't help because it returns an empty digest at resolve.go:112-114, preventing pod creation entirely.

Impact

This affects all users deploying private images from Docker Hub or other authenticated registries. The only workaround (using image digests) breaks CI/CD pipelines that rely on tags.

itay-nvn-nv avatar Dec 01 '25 20:12 itay-nvn-nv

Hi @itay-nvn-nv ,

the correct domain you need to use is index.docker.io, I just verified it and tag resolution does both work and can also be skipped using this domain. It seems that this is hardcoded somewhere in a library as docker or podman CLI also both know to use this hostname when specifying docker.io.

Hope that helps 🤞

linkvt avatar Dec 02 '25 10:12 linkvt

Hi @itay-nvn-nv ,

the correct domain you need to use is index.docker.io, I just verified it and tag resolution does both work and can also be skipped using this domain. It seems that this is hardcoded somewhere in a library as docker or podman CLI also both know to use this hostname when specifying docker.io.

Hope that helps 🤞

Thanks for the help mate :) this actually also happened with nvcr.io (also private registry). Im suspecting there might be a pattern here, where knative expects the index (?) url of the registry, while asking for the registry URL. Perhpas something thats worth fixing globally, i.e a function that finds the index URL for the registry and provides it to the next function for calling. Just a thought.

itay-nvn-nv avatar Dec 02 '25 10:12 itay-nvn-nv

I just checked the source code of the library that resolves index.docker.io from docker.io, there is no general rule for adding index. to anything else except docker.io: https://github.com/google/go-containerregistry/blob/e075f209120b2467fd1b7d24727f1890a0edb74a/pkg/name/registry.go#L134-L138

index.nvcr.io also does not resolve, so I guess that there was another issue when you tried to access nvcr?

Besides that it's IMO valid to expect the credentials for docker.io etc to also work with index.docker.io but I think that's something the more experienced maintainers would need to decide.

linkvt avatar Dec 02 '25 11:12 linkvt

Update from Original Report: After extensive debugging, seems like this is not a bug but rather a documentation gap / potential enhancement opportunity regarding Docker Hub authentication during tag-to-digest resolution.

The Issue Knative Serving's tag resolution fails with 401 Unauthorized for private Docker Hub images when the imagePullSecret uses docker.io as the registry URL, even though:

  • The credentials are valid
  • The same secret works for standard Kubernetes Deployments

Root Cause

  1. go-containerregistry internally aliases docker.ioindex.docker.io (registry.go#L127-131)

  2. k8schain performs strict URL part-count matching when looking up credentials (keychain.go#L300-302):

    if len(globURLParts) != len(targetURLParts) {
        return false, nil  // No match
    }
    
  3. This causes a mismatch:

    • Secret has: docker.io (2 parts)
    • Image resolves to: index.docker.io (3 parts)
    • Result: Credentials not matched → 401 Unauthorized

Why Kubernetes Deployments Work The kubelet's credential matching is more lenient and handles the docker.ioindex.docker.io aliasing internally.

Reproduction

# This FAILS (401 Unauthorized during tag resolution)
kubectl create secret docker-registry my-secret \
  --docker-server=docker.io \
  --docker-username=xxx --docker-password=xxx

# This WORKS
kubectl create secret docker-registry my-secret \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=xxx --docker-password=xxx

Verified Test Results

Secret Auth URL Kubernetes Deployment Knative Service
docker.io ✅ Pulls successfully ❌ 401 Unauthorized
https://index.docker.io/v1/ ✅ Pulls successfully ✅ Pulls successfully

Suggested Actions

  • Option 1 - Documentation:
    • Add a note in Knative docs that Docker Hub secrets must use https://index.docker.io/v1/ as the server URL.
  • Option 2 - Enhancement:
    • Consider handling the docker.ioindex.docker.io aliasing in k8schain's URL matching logic for parity with kubelet behavior.

itay-nvn-nv avatar Dec 08 '25 15:12 itay-nvn-nv

Thanks for the background info but this is basically what I described in my comment above? index.docker.io is used by the library for docker.io registry and afterwards the server used in the credentials has to match the registry.

Maybe you can let the LLM run another analysis for the nvcr.io registry as we currently don't know why the issue happens for that case, thanks!

linkvt avatar Dec 09 '25 08:12 linkvt

@linkvt Thanks! Quick note on nvcr.io: initially it seemed broken, but after careful testing it works fine. I probably just made a mistake during my first attempt.

The problem is specific to docker.io as you noted.

Documenting it in the official docs would be best, but leaving this issue open is also fine, whatever makes it searchable for the next person who runs into this.

itay-nvn-nv avatar Dec 09 '25 18:12 itay-nvn-nv