RevisionFailed when ServiceAccount references a non-existing imagePullSecret
Having a default ServiceAccount in a namespace referencing an imagePullSecrets that doesn't exist anymore in the namespace, creating a ksvc gets its Revision failed:
Revision "helloworld-go-00001" failed with message: Unable to fetch image "gcr.io/knative-samples/helloworld-go": failed to resolve image to digest: failed to initialize authentication: secrets "non-existing-pull-secret" not found.
Deploying the same image as an ordinary pod works fine with the same ServiceAccount.
What version of Knative?
0.20.0
Expected Behavior
Creating a Ksvc should work even if some of the image pull secrets don't exist, if they are not actually needed to pull the image.
Actual Behavior
A Revision fails.
Steps to Reproduce the Problem
Modify the default SA in a namespace to include a non-existing imagePullSecret:
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: default
secrets:
- name: default-token-6trqg
imagePullSecrets:
- name: non-existing-pull-secret
Create a Ksvc:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
Notice the ksvc revision fails:
Revision "helloworld-go-00001" failed with message: Unable to fetch image "gcr.io/knative-samples/helloworld-go": failed to resolve image to digest: failed to initialize authentication: secrets "non-existing-pull-secret" not found.
Such behaviour seems inconsistent with how k8s works, as the same image with the same ServiceAccount works fine as an ordinary Pod:
apiVersion: v1
kind: Pod
metadata:
name: hello
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
name: hello
/area API
Such behaviour seems inconsistent with how k8s works, as the same image with the same ServiceAccount works fine as an ordinary Pod:
I'm sort of in-different to this statement. Although we differ with K8s, their behaviour may not be intentional/defined. You could argue Knative's fail-fast helps more than in hinders.
Hmm, I tend to agree with @dprotaso here. Things might go breaking in unexpected ways using this I think, as some images will work and some won't :thinking:. Not a strong opinion either though.
Would it make sense to handle "missing imagePullSecret" as equivalent to "imagePullSecret not specified"?
/kind enhancement /good-first-issue /triage needs-user-input
@evankanderson: This request has been marked as suitable for new contributors.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.
In response to this:
Would it make sense to handle "missing
imagePullSecret" as equivalent to "imagePullSecretnot specified"?/kind enhancement /good-first-issue /triage needs-user-input
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Going to close this out as I haven't seen a compelling enough reason to change the behaviour
We encountered this problem recently in production. We had a service account with 2 pull secrets, one for the dev registry the other for the prod registry. Both exist in our dev clusters but only the prod one exists in production... you can guess what happened next.
Our fault for not testing properly, but it would be nice to have the flexibility of defining secrets that might/might not exist. Other teams using the same clusters don't see an issue because as mentioned an ordinary pod does not behave this way.
This surfaced again with a user using the following tooling: http://external-secrets.io/v0.5.6/
From slack: https://knative.slack.com/archives/C0186KU7STW/p1655389675950159
I have an image that uses imagePullSecrets and I have an ExternalSecret that pulls it from an (azure) keyvault entry, like this:
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: frontend spec: template: spec: containers: - name: frontend image: frontend:v1 imagePullSecrets: - name: regcred --- apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: secretstore-keyvault-secret-regcred namespace: rental spec: refreshInterval: 1h secretStoreRef: kind: SecretStore name: secretstore-keyvault target: name: regcred template: type: kubernetes.io/dockerconfigjson creationPolicy: Owner data: - secretKey: .dockerconfigjson remoteRef: key: DockerRegCredProblem is, when first loading this, the secret has not been loaded yet at the time it's trying to pull the revision so it doesn't exist. Is there a way to say "retry after secret is available" for images? (This is only a problem for the "initial setup", when pushing new revisions it's perfectly fine).
/lifecycle frozen /triage accepted
I believe this will require changes to k8schain https://github.com/google/go-containerregistry/tree/main/pkg/authn/k8schain
And we don't want to tight loop reconciliation if the secret isn't present - so we probably want an exponential fallback until the revision progress deadline passes
I think there may be two different issues here:
- If an
imagePullSecretis missing from the service account, but the remaining secrets are sufficient to resolve the digest, a polite warning / info condition about missing secrets seems sufficient. - If an
imagePullSecretis missing from the service account and we're unable to resolve the tag to a digest, we may want to enter a long-term "awaiting" condition rather than timing out more aggressively.
Note that in my use case (https://github.com/knative/serving/issues/10575#issuecomment-1157924093) I don't use a service account, ExternalSercret pulls azure keyvault secrets and places them directly into a secret:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: secretstore-keyvault-secret-regcred
namespace: rental
spec:
refreshInterval: 1h
secretStoreRef:
kind: SecretStore
name: secretstore-keyvault
target:
name: regcred
template:
type: kubernetes.io/dockerconfigjson
creationPolicy: Owner
data:
- secretKey: .dockerconfigjson
remoteRef:
key: DockerRegCred
From what I can see, the secret just doesn't exist at all when the revision gets pulled and gets updated a while after that (I have a fair amount of secrets, can take several seconds). Not sure if that's important for the case.
see https://github.com/google/go-containerregistry/pull/1472
k8schain will ignore non-existing imagepullsecrets now. Upgrade github.com/google/go-containerregistry to latest version in go.mod could fix this.
@dprotaso Hi! I am new to knative and I would like to contribute to this issue, is this still open?
We haven't bumped the dependency so this issue is up for grabs.
/assign @Bisht13
I have made a PR #13701, could you please review it? @dprotaso ✨