multus-cni icon indicating copy to clipboard operation
multus-cni copied to clipboard

Failed to set up pod network because token expired due to bound service account token

Open snowmansora opened this issue 2 years ago • 13 comments

What happend: Pod stuck at ContainerCreating and following unauthorized error is shown when describing it:

  Warning  FailedCreatePodSandBox  9m38s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "779db7fa6529acbdf3db228bc81a1f11e070e31c1d15354645e79f04631cb663" network for pod "ops-centos-8ff66846f-hff26": networkPlugin cni failed to set up pod "ops-centos-8ff66846f-hff26_default" network: Multus: [default/ops-centos-8ff66846f-hff26]: error getting pod: Unauthorized, failed to clean up sandbox container "779db7fa6529acbdf3db228bc81a1f11e070e31c1d15354645e79f04631cb663" network for pod "ops-centos-8ff66846f-hff26": networkPlugin cni failed to teardown pod "ops-centos-8ff66846f-hff26_default" network: Multus: [default/ops-centos-8ff66846f-hff26]: error getting pod: Unauthorized]

What you expected to happen: Multus set up the pod network successfully and pod can run.

Anything else we need to know?: Bound service account token is turned on by default in K8s 1.21: https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md In previous versions of K8s, service account token doesn't has an expiration. In K8s 1.21, the token expires after 1 year or expires after 1 hour if service-account-extend-token-expiration is false. Looking at Multus source code, I think the token will never be updated after the Multus pod finished its initial setup in entrypoint.sh. Restarting the multus pod will fix the problem, because a new token will be used.

How to reproduce it (as minimally and precisely as possible): In a K8s with API server argument service-account-extend-token-expiration set to false. After multus pod has been running for an hour, create a new pod, and the pod will stuck at ContainerCreating.

Environment:

  • Multus version: ghcr.io/k8snetworkplumbingwg/multus-cni:v3.8
  • Kubernetes version (use kubectl version): 1.21.5
  • Primary CNI for Kubernetes cluster: Flannel
  • OS (e.g. from /etc/os-release): CentOS 7

snowmansora avatar May 25 '22 22:05 snowmansora

Also, I notice there was an attempt to refresh the token, https://github.com/k8snetworkplumbingwg/multus-cni/pull/686/files, but that was never completed/merged-to-master.

Wondering is there any plan to fix the issue, now that there is a need to refresh the token or else Multus can potentially fail after an hour of running when service-account-extend-token-expiration is set to false.

Thanks.

snowmansora avatar May 25 '22 22:05 snowmansora

https://github.com/aws/amazon-vpc-cni-k8s/issues/1868#issuecomment-1126344830

We've also experienced this and it caused much confusion for a while.

Alan01252 avatar Jun 17 '22 21:06 Alan01252

Yeah, this is definitely an issue that can occur!

Let me try to ressurect #686 and see if I can get that in there.

In the feature/multus-4.0 branch, we have the "thick plugin" architecture which will account for this with the in-pod kube auth, but we should have it work in the current version when certs are rotated.

dougbtv avatar Jun 23 '22 14:06 dougbtv

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Sep 22 '22 03:09 github-actions[bot]

I see this MR https://github.com/k8snetworkplumbingwg/multus-cni/pull/686 has been closed. And do you plan to merge it to the master ? for these time,multus 4.0 has not release version @dougbtv

JornShen avatar Oct 12 '22 02:10 JornShen

@dougbtv - Any update on which version has this fix? This is also affecting whereabouts.

infinitydon avatar Nov 15 '22 15:11 infinitydon

Bump... any ideas why this PR was closed? https://github.com/k8snetworkplumbingwg/multus-cni/pull/686

xagent003 avatar Dec 19 '22 21:12 xagent003

Yeah, it should be open, it just went auto stale.

dougbtv avatar Dec 21 '22 13:12 dougbtv

Also, for what it's worth, I think with the thick plugin architecture, it shouldn't be such a big deal, that is, it should be using the service account token in the pod, and not a generated kubeconfig that resides on disk, so... It should just use the updated token.

dougbtv avatar Dec 21 '22 13:12 dougbtv

Also, for what it's worth, I think with the thick plugin architecture, it shouldn't be such a big deal, that is, it should be using the service account token in the pod, and not a generated kubeconfig that resides on disk, so... It should just use the updated token.

has anybody tried Thick plugin architecture in EKS already ?

caribbeantiger avatar Dec 21 '22 20:12 caribbeantiger

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Mar 23 '23 02:03 github-actions[bot]

Bumping this again since it seems like it went stale. Any updates?

MarkSpencerTan avatar Mar 07 '24 22:03 MarkSpencerTan

For EKS, thick-plugin is now available, so far the token issue seems to be resolved using it.

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/config/multus/v4.0.2-eksbuild.1/multus-daemonset-thick.yml

infinitydon avatar Mar 07 '24 22:03 infinitydon