kops icon indicating copy to clipboard operation
kops copied to clipboard

401 unauthorised on image pull in upgrade to 1.23

Open pjaak opened this issue 3 years ago • 11 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information. Version 1.23.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue? kops upgrade to upgrade from 1.21 -> 1.23

5. What happened after the commands executed? Restarted the master nodes and I found this in the syslog on the masters:

Mar 16 23:17:24 ip-10-0-12-208 kubelet[4438]: E0316 23:17:24.881707 4438 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"elmosoftware.jfrog.io/elmo-docker/pause:3.6\": failed to pull image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to pull and unpack image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to resolve reference \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized" pod="kube-system/kube-scheduler-ip-10-0-12-208.ap-southeast-2.compute.internal"

9. Anything else do we need to know? We use artifactory as our containerProxy:

assets: containerProxy: artifactory.jfrog.io/docker

We have always had a docker secret set using kops and it has always worked but know all of a sudden its not working? Is the docker config no longer working for authentication? i can see the config.json file exists in the ec2 servers.

EDIT: I can see this in the /etc/containerd/config-kops.toml: sandbox_image = "artfiactory.jfrog.io/elmo-docker/pause:3.6"

This doesn't seem to be present in my other clusters with 1.21

pjaak avatar Mar 16 '22 23:03 pjaak

The docker credentials are used by kubelet. They do not work when containerd fetches the images directly, which seems to be the case above.

olemarkus avatar Mar 17 '22 20:03 olemarkus

/remove-kind bug

olemarkus avatar Mar 17 '22 20:03 olemarkus

Thanks @olemarkus ,

Is there any way to configure contained auth through kops? We need to use the proxy otherwise we run into docker hub limits.

pjaak avatar Mar 17 '22 21:03 pjaak

You may want to configure containerd to use registryMirror only for docker hub: https://kops.sigs.k8s.io/cluster_spec/#registry-mirrors

Meanwhile, I'll see if I can have nodeup write docker credentials to containerd config rather than writing the docker.json file only used by kubelet. That will take some time though.

olemarkus avatar Mar 18 '22 06:03 olemarkus

Okay thanks @olemarkus ,

Let me look into registry mirrors.

Also appreciate you looking into the nodeup config.

pjaak avatar Mar 20 '22 22:03 pjaak

Looks like i couldnt get the mirror working was still saying 401 unauthorized :(

For now i have just manually changed the container config on the masters and seems to be working. Would be good to get a permanent solution tho

pjaak avatar Mar 21 '22 02:03 pjaak

Nevermind it looks like every ec2 has this set: [plugins."io.containerd.grpc.v1.cri"] sandbox_image = "artifactory.jfrog.io/docker/pause:3.6"

Which also fails with 401 because the credentials arent set for contained. This is blocking us upgrading at the moment sadly.

pjaak avatar Mar 21 '22 03:03 pjaak

I assume you still have spec.assets set. Please remove that part of the cluster spec and rely only on spec.containerd.registryMirrors.

olemarkus avatar Mar 26 '22 07:03 olemarkus

Hi @olemarkus ,

I was actually able to resolve this by setting the following config in kops:

cluster.Spec.Kubelet.PodInfraContainerImage

I set this to our non-artifactory image and it works. Would still be nice to get auth sorted tho.

pjaak avatar Mar 27 '22 23:03 pjaak

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 25 '22 23:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 25 '22 23:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Aug 25 '22 00:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 25 '22 00:08 k8s-ci-robot