eks-anywhere
eks-anywhere copied to clipboard
After upgrade kubernetes version of EKSA cluster for bare metal, the last workflow stuck at STATE_RUNNING
What happened:
After I upgrade kubernetes version of EKSA cluster for bare metal with 4 CP nodes(3 used + 1 idle), the workflow of the last upgraded node stuck at STATE_RUNNING (for almost 80 minutes already):
armada@admin-machine2:~/eksa/mgmt02$ kubectl get workflow -A -o wide
NAMESPACE NAME TEMPLATE STATE
eksa-system mgmt02-control-plane-template-1710166848788-hgl8k mgmt02-control-plane-template-1710166848788-hgl8k STATE_SUCCESS
eksa-system mgmt02-control-plane-template-1710166848788-qsvr6 mgmt02-control-plane-template-1710166848788-qsvr6 STATE_SUCCESS
eksa-system mgmt02-control-plane-template-1710166848788-vdqj9 mgmt02-control-plane-template-1710166848788-vdqj9 STATE_RUNNING
armada@admin-machine2:~/eksa/mgmt02$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
eksa-control-02 Ready control-plane 53m v1.27.11 10.20.22.224 <none> Ubuntu 20.04.6 LTS 5.4.0-173-generic containerd://1.7.10
eksa-control-03 Ready control-plane 79m v1.27.11 10.20.22.227 <none> Ubuntu 20.04.6 LTS 5.4.0-173-generic containerd://1.7.10
eksa-control-04 Ready control-plane 102m v1.27.11 10.20.22.226 <none> Ubuntu 20.04.6 LTS 5.4.0-173-generic containerd://1.7.10
armada@admin-machine2:~/eksa/mgmt02$ kubectl get machines.cluster.x-k8s.io -A -o wide
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
eksa-system mgmt02-4vwpn mgmt02 eksa-control-04 tinkerbell://eksa-system/eksa-control-04 Running 127m v1.27.7-eks-1-27-15
eksa-system mgmt02-q278f mgmt02 eksa-control-02 tinkerbell://eksa-system/eksa-control-02 Running 78m v1.27.7-eks-1-27-15
eksa-system mgmt02-z9j6w mgmt02 eksa-control-03 tinkerbell://eksa-system/eksa-control-03 Running 99m v1.27.7-eks-1-27-15
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- EKS Anywhere Release: v0.18.2
- EKS Distro Release: 1.26/1.27
I have the exact same issue... is there any update.. my nodes are upgraded... but the reboot action (last action) in one of the nodes workflow reports as STATE_RUNNING and hence the entire workflow for that node is also reporting as STATE_RUNNING. Why would that happen.. as the node is alreadyshowing upgraded with new image