harvester
harvester copied to clipboard
[Question] Kubernetes wont start after removing first node
Hello everyone,
I recently had to remove the fist node I added to our Harvester cluster. (Hardware failure) I was able to put the Node in maintenance mode before removing it from the dashboard.
After removing the node in the dashboard the cluster went down. VIP address is unavailable.
When I log on to the running nodes it seems that Kubernetes is also down.
systemctl status rke2-server.service
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/rke2-server.service.d
└─override.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2022-07-27 15:15:57 UTC; 742ms ago
Docs: https://github.com/rancher/rke2#readme
Process: 24598 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited> Process: 24607 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 24608 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 24609 ExecStartPre=/usr/sbin/harv-update-rke2-server-url server (code=exited, status=0/SUCCESS)
Process: 24611 ExecStart=/usr/local/bin/rke2 server (code=exited, status=1/FAILURE)
Process: 24632 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (container> Main PID: 24611 (code=exited, status=1/FAILURE)
kubectl get vm -n harvester-system
W0727 15:15:34.367736 24316 loader.go:221] Config not found: /etc/rancher/rke2/rke2.yaml
What is the best way to debug this? Little bit stuck here
Thanks
@Iliasb how many nodes did you have in your cluster before you removed the first node?
@Iliasb how many nodes did you have in your cluster before you removed the first node?
4 Nodes
Found the issue.
etcdserver/api/etcdhttp: /health error; no leader (status code 503)
How can I select another node as master?
Hi @Iliasb, thanks for filing an issue here. Do you remember whether your cluster had 3 control plane nodes? If yes, you may encounter a know issue #2191. You can try the workaround in the thread https://github.com/harvester/harvester/issues/2191#issuecomment-1115794201. Thank you.
The issue is not updated/reported recently, and the farily possible root cause was identified and fixed. close now.
Feel free to reopen, thanks.