kubeadm icon indicating copy to clipboard operation
kubeadm copied to clipboard

migrate users away from CRI socket paths that don't have URL scheme

Open neolit123 opened this issue 3 years ago • 11 comments

summary of the problem:

  • the kubeadm default socket paths on Linux don't have the unix:// prefix.
  • kubeadm socket detection checks files on disk and does not dial a socket and does not prepend unix://
  • the kubelet has long deprecated paths without unix:// and it might stop supporting them in the future

what we should do:

  • we should tell the user that paths without unix:// are deprecated and will cause an error in 3 or 4 releases (GA)
  • for new kubeadm clusters we should start showing a warning if the user doesn't have unix:// in the path.
  • during kubeadm upgrade, we should iterate all nodes in the cluster and patch "kubeadm.alpha.kubernetes.io/cri-socket"
  • in X releases turn the warning into an error.

1.24 action items:

  • [x] make sure init/join/upgrade use URL endpoints https://github.com/kubernetes/kubernetes/pull/107295

1.25 action items:

  • [x] cleanup TODOs added in 1.24: https://github.com/kubernetes/kubernetes/pull/109356
  • [x] add e2e test to ensure URL scheme is present on sockets. https://github.com/kubernetes/kubernetes/pull/110287

1.26

  • [x] remove de-dup code (bugfix) https://github.com/kubernetes/kubernetes/pull/112005

1.27 ?

  • turn warnings into errors?

neolit123 avatar Mar 31 '21 13:03 neolit123

kubernetes/kubernetes#100578 is not handling the upgrade case.

pacoxu avatar Apr 01 '21 02:04 pacoxu

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 30 '21 03:06 fejta-bot

/remove-lifecycle stale

neolit123 avatar Jul 26 '21 18:07 neolit123

Testing 1.22, and I find kubeadm.alpha.kubernetes.io/cri-socket is not in my 1.22.0-beta.0 node.

[root@daocloud ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: failed to get node registration: node daocloud doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
To see the stack trace of this error execute with --v=5 or higher

I got the error when I upgrade from v1.22.0-beta.1 to v1.22.0 with kubelet 1.21.1.

I will look into the problem here.(I am not familiar with the history of this annotation and need some digging)[One of my cluster node was wrongly removed by kubectl delete node xxx and I restart the kubelet to recovery from that, but no label/annotation for this node, that's the reason so we no ignore the error here or do some enhances to auto-detect it.]

pacoxu avatar Aug 05 '21 01:08 pacoxu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 03 '21 02:11 k8s-triage-robot

i can try tacking the upgrade problem for 1.24 as part of the dockershim refactors. https://github.com/kubernetes/kubeadm/issues/2626

  • kubeadm init/join can auto prepend the unix:// scheme if missing for new clusters. (you PR is here https://github.com/kubernetes/kubernetes/pull/100578 @pacoxu)
  • kubeadm upgrade apply/node can mutate the Node object and kubelet-flags file on disk. (TODO)

it might be a good idea to do this soon given the kubelet endpoint flag is no longer "experimental".

neolit123 avatar Dec 21 '21 01:12 neolit123

updated PR is here: https://github.com/kubernetes/kubernetes/pull/107295

neolit123 avatar Jan 03 '22 18:01 neolit123

the 1.25 cleanup is ready to merge: https://github.com/kubernetes/kubernetes/pull/109356

about this:

add e2e test to ensure URL scheme is present on sockets.

maybe we can add an e2e test in https://github.com/kubernetes/kubernetes/blob/master/test/e2e_kubeadm/nodes_test.go

neolit123 avatar May 17 '22 22:05 neolit123

Is it too early in v1.26 for users to turn warnings into errors? Either v1.26 or v1.27 is OK for me. I prefer v1.27 or later.

pacoxu avatar Aug 24 '22 09:08 pacoxu

1.27 sounds better but we might want to do it only after the kubelet starts doing it, if that ever happens.

neolit123 avatar Aug 24 '22 10:08 neolit123

As a note for people (like me) who ended up at this issue via search engines, if your kubeadm'ed cluster is throwing this error when running kubeadm upgrade node:

[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
error execution phase kubelet-config: could not retrieve the node registration options for this node:
  node your-node-goes-here doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation

You can fix this by manually doing:

kubectl edit node <nodename>

and in the annotations section add this:

kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock

Make sure to verify this is the correct socket path (snoop a control plane node to see what they have set, or just use the same one you use for crictl on the command line on that given node).

Foritus avatar Sep 17 '22 17:09 Foritus

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 16 '22 18:12 k8s-triage-robot

turn warnings into errors?

@pacoxu i think we can put that on hold until the kubelet decides to error out.

or perhaps just wait for minumum one more release.

neolit123 avatar Dec 16 '22 18:12 neolit123

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 15 '23 19:01 k8s-triage-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 15 '23 19:04 k8s-triage-robot

moved to "Next" milestone, lowered priority and frozen

neolit123 avatar Apr 19 '23 17:04 neolit123

let's close this for now. in a future release if the kubelet drops support, we can start erroring on the kubeadm side before a kubelet is deployed.

neolit123 avatar Nov 08 '23 13:11 neolit123