kops
kops copied to clipboard
401 unauthorised on image pull in upgrade to 1.23
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Version 1.23.0
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue? kops upgrade to upgrade from 1.21 -> 1.23
5. What happened after the commands executed? Restarted the master nodes and I found this in the syslog on the masters:
Mar 16 23:17:24 ip-10-0-12-208 kubelet[4438]: E0316 23:17:24.881707 4438 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"elmosoftware.jfrog.io/elmo-docker/pause:3.6\": failed to pull image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to pull and unpack image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to resolve reference \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized" pod="kube-system/kube-scheduler-ip-10-0-12-208.ap-southeast-2.compute.internal"
9. Anything else do we need to know? We use artifactory as our containerProxy:
assets: containerProxy: artifactory.jfrog.io/docker
We have always had a docker secret set using kops and it has always worked but know all of a sudden its not working? Is the docker config no longer working for authentication? i can see the config.json file exists in the ec2 servers.
EDIT:
I can see this in the /etc/containerd/config-kops.toml:
sandbox_image = "artfiactory.jfrog.io/elmo-docker/pause:3.6"
This doesn't seem to be present in my other clusters with 1.21
The docker credentials are used by kubelet. They do not work when containerd fetches the images directly, which seems to be the case above.
/remove-kind bug
Thanks @olemarkus ,
Is there any way to configure contained auth through kops? We need to use the proxy otherwise we run into docker hub limits.
You may want to configure containerd to use registryMirror only for docker hub: https://kops.sigs.k8s.io/cluster_spec/#registry-mirrors
Meanwhile, I'll see if I can have nodeup write docker credentials to containerd config rather than writing the docker.json file only used by kubelet. That will take some time though.
Okay thanks @olemarkus ,
Let me look into registry mirrors.
Also appreciate you looking into the nodeup config.
Looks like i couldnt get the mirror working was still saying 401 unauthorized :(
For now i have just manually changed the container config on the masters and seems to be working. Would be good to get a permanent solution tho
Nevermind it looks like every ec2 has this set:
[plugins."io.containerd.grpc.v1.cri"] sandbox_image = "artifactory.jfrog.io/docker/pause:3.6"
Which also fails with 401 because the credentials arent set for contained. This is blocking us upgrading at the moment sadly.
I assume you still have spec.assets set. Please remove that part of the cluster spec and rely only on spec.containerd.registryMirrors.
Hi @olemarkus ,
I was actually able to resolve this by setting the following config in kops:
cluster.Spec.Kubelet.PodInfraContainerImage
I set this to our non-artifactory image and it works. Would still be nice to get auth sorted tho.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen - Mark this issue or PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen- Mark this issue or PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.