serving icon indicating copy to clipboard operation
serving copied to clipboard

RevisionFailed when ServiceAccount references a non-existing imagePullSecret

Open maschmid opened this issue 4 years ago • 10 comments

Having a default ServiceAccount in a namespace referencing an imagePullSecrets that doesn't exist anymore in the namespace, creating a ksvc gets its Revision failed:

Revision "helloworld-go-00001" failed with message: Unable to fetch image "gcr.io/knative-samples/helloworld-go": failed to resolve image to digest: failed to initialize authentication: secrets "non-existing-pull-secret" not found.

Deploying the same image as an ordinary pod works fine with the same ServiceAccount.

What version of Knative?

0.20.0

Expected Behavior

Creating a Ksvc should work even if some of the image pull secrets don't exist, if they are not actually needed to pull the image.

Actual Behavior

A Revision fails.

Steps to Reproduce the Problem

Modify the default SA in a namespace to include a non-existing imagePullSecret:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: default
secrets:
- name: default-token-6trqg
imagePullSecrets:
- name: non-existing-pull-secret

Create a Ksvc:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
spec:
  template:
    spec:
      containers:
      - image: gcr.io/knative-samples/helloworld-go

Notice the ksvc revision fails:

Revision "helloworld-go-00001" failed with message: Unable to fetch image "gcr.io/knative-samples/helloworld-go": failed to resolve image to digest: failed to initialize authentication: secrets "non-existing-pull-secret" not found.

Such behaviour seems inconsistent with how k8s works, as the same image with the same ServiceAccount works fine as an ordinary Pod:

apiVersion: v1
kind: Pod
metadata:
  name: hello
spec:
  containers:
  - image: gcr.io/knative-samples/helloworld-go
    name: hello

maschmid avatar Jan 18 '21 17:01 maschmid

/area API

Such behaviour seems inconsistent with how k8s works, as the same image with the same ServiceAccount works fine as an ordinary Pod:

I'm sort of in-different to this statement. Although we differ with K8s, their behaviour may not be intentional/defined. You could argue Knative's fail-fast helps more than in hinders.

dprotaso avatar Jan 18 '21 20:01 dprotaso

Hmm, I tend to agree with @dprotaso here. Things might go breaking in unexpected ways using this I think, as some images will work and some won't :thinking:. Not a strong opinion either though.

markusthoemmes avatar Jan 20 '21 08:01 markusthoemmes

Would it make sense to handle "missing imagePullSecret" as equivalent to "imagePullSecret not specified"?

/kind enhancement /good-first-issue /triage needs-user-input

evankanderson avatar Mar 16 '21 01:03 evankanderson

@evankanderson: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

Would it make sense to handle "missing imagePullSecret" as equivalent to "imagePullSecret not specified"?

/kind enhancement /good-first-issue /triage needs-user-input

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot avatar Mar 16 '21 01:03 knative-prow-robot

Going to close this out as I haven't seen a compelling enough reason to change the behaviour

dprotaso avatar Jun 12 '21 02:06 dprotaso

We encountered this problem recently in production. We had a service account with 2 pull secrets, one for the dev registry the other for the prod registry. Both exist in our dev clusters but only the prod one exists in production... you can guess what happened next.

Our fault for not testing properly, but it would be nice to have the flexibility of defining secrets that might/might not exist. Other teams using the same clusters don't see an issue because as mentioned an ordinary pod does not behave this way.

SeanEmac avatar Nov 17 '21 18:11 SeanEmac

This surfaced again with a user using the following tooling: http://external-secrets.io/v0.5.6/

From slack: https://knative.slack.com/archives/C0186KU7STW/p1655389675950159

I have an image that uses imagePullSecrets and I have an ExternalSecret that pulls it from an (azure) keyvault entry, like this:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: frontend
spec:
  template:
    spec:
      containers:
      - name: frontend
        image: frontend:v1
      imagePullSecrets:
      - name: regcred
--- 
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: secretstore-keyvault-secret-regcred
  namespace: rental
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: SecretStore
    name: secretstore-keyvault
  target:
    name: regcred
    template:
      type: kubernetes.io/dockerconfigjson
    creationPolicy: Owner
  data:
  - secretKey: .dockerconfigjson
    remoteRef:
      key: DockerRegCred

Problem is, when first loading this, the secret has not been loaded yet at the time it's trying to pull the revision so it doesn't exist. Is there a way to say "retry after secret is available" for images? (This is only a problem for the "initial setup", when pushing new revisions it's perfectly fine).

dprotaso avatar Jun 16 '22 17:06 dprotaso

/lifecycle frozen /triage accepted

I believe this will require changes to k8schain https://github.com/google/go-containerregistry/tree/main/pkg/authn/k8schain

And we don't want to tight loop reconciliation if the secret isn't present - so we probably want an exponential fallback until the revision progress deadline passes

dprotaso avatar Jun 16 '22 17:06 dprotaso

I think there may be two different issues here:

  1. If an imagePullSecret is missing from the service account, but the remaining secrets are sufficient to resolve the digest, a polite warning / info condition about missing secrets seems sufficient.
  2. If an imagePullSecret is missing from the service account and we're unable to resolve the tag to a digest, we may want to enter a long-term "awaiting" condition rather than timing out more aggressively.

evankanderson avatar Jun 16 '22 19:06 evankanderson

Note that in my use case (https://github.com/knative/serving/issues/10575#issuecomment-1157924093) I don't use a service account, ExternalSercret pulls azure keyvault secrets and places them directly into a secret:


apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: secretstore-keyvault-secret-regcred
  namespace: rental
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: SecretStore
    name: secretstore-keyvault
  target:
    name: regcred
    template:
      type: kubernetes.io/dockerconfigjson
    creationPolicy: Owner
  data:
  - secretKey: .dockerconfigjson
    remoteRef:
      key: DockerRegCred

From what I can see, the secret just doesn't exist at all when the revision gets pulled and gets updated a while after that (I have a fair amount of secrets, can take several seconds). Not sure if that's important for the case.

carlokok avatar Jun 17 '22 06:06 carlokok

see https://github.com/google/go-containerregistry/pull/1472

k8schain will ignore non-existing imagepullsecrets now. Upgrade github.com/google/go-containerregistry to latest version in go.mod could fix this.

tydra-wang avatar Jan 16 '23 09:01 tydra-wang

@dprotaso Hi! I am new to knative and I would like to contribute to this issue, is this still open?

vishal-chdhry avatar Jan 26 '23 04:01 vishal-chdhry

We haven't bumped the dependency so this issue is up for grabs.

dprotaso avatar Feb 13 '23 23:02 dprotaso

/assign @Bisht13

Bisht13 avatar Feb 14 '23 01:02 Bisht13

I have made a PR #13701, could you please review it? @dprotaso ✨

Bisht13 avatar Feb 14 '23 18:02 Bisht13