kubeadm
kubeadm copied to clipboard
migrate users away from CRI socket paths that don't have URL scheme
summary of the problem:
- the kubeadm default socket paths on Linux don't have the unix:// prefix.
- kubeadm socket detection checks files on disk and does not dial a socket and does not prepend unix://
- the kubelet has long deprecated paths without unix:// and it might stop supporting them in the future
what we should do:
- we should tell the user that paths without unix:// are deprecated and will cause an error in 3 or 4 releases (GA)
- for new kubeadm clusters we should start showing a warning if the user doesn't have unix:// in the path.
- during kubeadm upgrade, we should iterate all nodes in the cluster and patch "kubeadm.alpha.kubernetes.io/cri-socket"
- in X releases turn the warning into an error.
1.24 action items:
- [x] make sure init/join/upgrade use URL endpoints https://github.com/kubernetes/kubernetes/pull/107295
1.25 action items:
- [x] cleanup TODOs added in 1.24: https://github.com/kubernetes/kubernetes/pull/109356
- [x] add e2e test to ensure URL scheme is present on sockets. https://github.com/kubernetes/kubernetes/pull/110287
1.26
- [x] remove de-dup code (bugfix) https://github.com/kubernetes/kubernetes/pull/112005
1.27 ?
- turn warnings into errors?
kubernetes/kubernetes#100578 is not handling the upgrade
case.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Testing 1.22, and I find kubeadm.alpha.kubernetes.io/cri-socket
is not in my 1.22.0-beta.0 node.
[root@daocloud ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: failed to get node registration: node daocloud doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
To see the stack trace of this error execute with --v=5 or higher
I got the error when I upgrade from v1.22.0-beta.1 to v1.22.0 with kubelet 1.21.1.
I will look into the problem here.(I am not familiar with the history of this annotation and need some digging)[One of my cluster node was wrongly removed by kubectl delete node xxx
and I restart the kubelet to recovery from that, but no label/annotation for this node, that's the reason so we no ignore the error here or do some enhances to auto-detect it.]
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
i can try tacking the upgrade problem for 1.24 as part of the dockershim refactors. https://github.com/kubernetes/kubeadm/issues/2626
-
kubeadm init/join
can auto prepend theunix://
scheme if missing for new clusters. (you PR is here https://github.com/kubernetes/kubernetes/pull/100578 @pacoxu) -
kubeadm upgrade apply/node
can mutate the Node object and kubelet-flags file on disk. (TODO)
it might be a good idea to do this soon given the kubelet endpoint flag is no longer "experimental".
updated PR is here: https://github.com/kubernetes/kubernetes/pull/107295
the 1.25 cleanup is ready to merge: https://github.com/kubernetes/kubernetes/pull/109356
about this:
add e2e test to ensure URL scheme is present on sockets.
maybe we can add an e2e test in https://github.com/kubernetes/kubernetes/blob/master/test/e2e_kubeadm/nodes_test.go
Is it too early in v1.26 for users to turn warnings into errors? Either v1.26 or v1.27 is OK for me. I prefer v1.27 or later.
1.27 sounds better but we might want to do it only after the kubelet starts doing it, if that ever happens.
As a note for people (like me) who ended up at this issue via search engines, if your kubeadm'ed cluster is throwing this error when running kubeadm upgrade node
:
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
error execution phase kubelet-config: could not retrieve the node registration options for this node:
node your-node-goes-here doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
You can fix this by manually doing:
kubectl edit node <nodename>
and in the annotations
section add this:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
Make sure to verify this is the correct socket path (snoop a control plane node to see what they have set, or just use the same one you use for crictl on the command line on that given node).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
turn warnings into errors?
@pacoxu i think we can put that on hold until the kubelet decides to error out.
or perhaps just wait for minumum one more release.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
moved to "Next" milestone, lowered priority and frozen
let's close this for now. in a future release if the kubelet drops support, we can start erroring on the kubeadm side before a kubelet is deployed.