kops 401 unauthorised on image pull in upgrade to 1.23

/kind bug

1. What kops version are you running? The command kops version, will display this information. Version 1.23.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue? kops upgrade to upgrade from 1.21 -> 1.23

5. What happened after the commands executed? Restarted the master nodes and I found this in the syslog on the masters:

Mar 16 23:17:24 ip-10-0-12-208 kubelet[4438]: E0316 23:17:24.881707 4438 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"elmosoftware.jfrog.io/elmo-docker/pause:3.6\": failed to pull image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to pull and unpack image \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to resolve reference \"artfiactory.jfrog.io/elmo-docker/pause:3.6\": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized" pod="kube-system/kube-scheduler-ip-10-0-12-208.ap-southeast-2.compute.internal"

9. Anything else do we need to know? We use artifactory as our containerProxy:

assets: containerProxy: artifactory.jfrog.io/docker

We have always had a docker secret set using kops and it has always worked but know all of a sudden its not working? Is the docker config no longer working for authentication? i can see the config.json file exists in the ec2 servers.

EDIT: I can see this in the /etc/containerd/config-kops.toml: sandbox_image = "artfiactory.jfrog.io/elmo-docker/pause:3.6"

This doesn't seem to be present in my other clusters with 1.21

Mar 16 '22 23:03 pjaak

The docker credentials are used by kubelet. They do not work when containerd fetches the images directly, which seems to be the case above.

Mar 17 '22 20:03 olemarkus

/remove-kind bug

Mar 17 '22 20:03 olemarkus

Thanks @olemarkus ,

Is there any way to configure contained auth through kops? We need to use the proxy otherwise we run into docker hub limits.

Mar 17 '22 21:03 pjaak

You may want to configure containerd to use registryMirror only for docker hub: https://kops.sigs.k8s.io/cluster_spec/#registry-mirrors

Meanwhile, I'll see if I can have nodeup write docker credentials to containerd config rather than writing the docker.json file only used by kubelet. That will take some time though.

Mar 18 '22 06:03 olemarkus

Okay thanks @olemarkus ,

Let me look into registry mirrors.

Also appreciate you looking into the nodeup config.

Mar 20 '22 22:03 pjaak

Looks like i couldnt get the mirror working was still saying 401 unauthorized :(

For now i have just manually changed the container config on the masters and seems to be working. Would be good to get a permanent solution tho

Mar 21 '22 02:03 pjaak

Nevermind it looks like every ec2 has this set: [plugins."io.containerd.grpc.v1.cri"] sandbox_image = "artifactory.jfrog.io/docker/pause:3.6"

Which also fails with 401 because the credentials arent set for contained. This is blocking us upgrading at the moment sadly.

Mar 21 '22 03:03 pjaak

I assume you still have spec.assets set. Please remove that part of the cluster spec and rely only on spec.containerd.registryMirrors.

Mar 26 '22 07:03 olemarkus

Hi @olemarkus ,

I was actually able to resolve this by setting the following config in kops:

cluster.Spec.Kubelet.PodInfraContainerImage

I set this to our non-artifactory image and it works. Would still be nice to get auth sorted tho.

Mar 27 '22 23:03 pjaak

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 25 '22 23:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 25 '22 23:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Aug 25 '22 00:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 25 '22 00:08 k8s-ci-robot

kops kops copied to clipboard

401 unauthorised on image pull in upgrade to 1.23

kops
kops copied to clipboard