k3s
k3s copied to clipboard
Nodes not in ready state after certificate renewal on master node
I was not able to connect to the k3s cluster.
root:/etc/rancher/k3s# kubectl get pods --kubeconfig k3s.yaml error: You must be logged in to the server (Unauthorized)
Then checked the certificates and renewed following https://www.ibm.com/support/pages/node/6444205. Cluster access issue resolved.
Post certificate renewal nodes are in notready state.
root@jump:~# kubectl get nodes --kubeconfig .kube/k3s-stg-config
NAME STATUS ROLES AGE VERSION
stg-vgw-2 NotReady <none> 2y48d v1.21.1+k3s1
stg-vgw-3 NotReady <none> 2y48d v1.21.1+k3s1
stg-vgw-1 Ready control-plane,master 2y48d v1.21.1+k3s1
Describe node output
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 07 Jun 2021 06:46:40 +0000 Mon, 07 Jun 2021 06:46:40 +0000 FlannelIsUp Flannel is running on this node
MemoryPressure Unknown Tue, 07 Jun 2022 06:42:06 +0000 Thu, 14 Jul 2022 06:46:14 +0000 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Tue, 07 Jun 2022 06:42:06 +0000 Thu, 14 Jul 2022 06:46:14 +0000 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Tue, 07 Jun 2022 06:42:06 +0000 Thu, 14 Jul 2022 06:46:14 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Tue, 07 Jun 2022 06:42:06 +0000 Thu, 14 Jul 2022 06:46:14 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
Need help is bringing the nodes to ready state
Just showing that there's a problem doesn't give us much to work with; some diagnostic information would be helpful. For example, what do the k3s-agent service logs on the NotReady nodes show? If a quick examination doesn't reveal anything useful, can you attach them to this issue?
Also, those 3rd party instructions that you followed to rotate the cert appear to be for k3s v1.18; those steps should no longer be necessary and are not related to the problem you were experiencing.
I see similar behavior (worker "NotReady") with the following version:
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+k3s1", GitCommit:"990ba0e88c90f8ed8b50e0ccd375937b841b176e", GitTreeState:"clean", BuildDate:"2022-07-19T01:10:03Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Each day, the worker
loses connectivity with the master
node. Could be certificate renewal, how can I check?
Each day, the worker loses connectivity with the master node. Could be certificate renewal
@mihaigalos certificate renewal only happens when k3s is starting, not daily, so I would doubt that's related, unless you're restarting the k3s process every day.
Just showing what version you're running doesn't give us much to work with. Can you open a new issue, and attach k3s/k3s-agent journald logs from the nodes in question?
Just showing what version you're running doesn't give us much to work with. Can you open a new issue, and attach k3s/k3s-agent journald logs from the nodes in question?
I realize that now, sorry. As soon as I can reproduce, I'm creating an issue with logs. Cluster looks stable atm.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.