eks-anywhere icon indicating copy to clipboard operation
eks-anywhere copied to clipboard

After upgrade kubernetes version of EKSA cluster for bare metal, the last workflow stuck at STATE_RUNNING

Open ygao-armada opened this issue 1 year ago • 1 comments

What happened:

After I upgrade kubernetes version of EKSA cluster for bare metal with 4 CP nodes(3 used + 1 idle), the workflow of the last upgraded node stuck at STATE_RUNNING (for almost 80 minutes already):

armada@admin-machine2:~/eksa/mgmt02$ kubectl get workflow -A -o wide
NAMESPACE     NAME                                                TEMPLATE                                            STATE
eksa-system   mgmt02-control-plane-template-1710166848788-hgl8k   mgmt02-control-plane-template-1710166848788-hgl8k   STATE_SUCCESS
eksa-system   mgmt02-control-plane-template-1710166848788-qsvr6   mgmt02-control-plane-template-1710166848788-qsvr6   STATE_SUCCESS
eksa-system   mgmt02-control-plane-template-1710166848788-vdqj9   mgmt02-control-plane-template-1710166848788-vdqj9   STATE_RUNNING

armada@admin-machine2:~/eksa/mgmt02$ kubectl get node -o wide
NAME              STATUS   ROLES           AGE    VERSION    INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
eksa-control-02   Ready    control-plane   53m    v1.27.11   10.20.22.224   <none>        Ubuntu 20.04.6 LTS   5.4.0-173-generic   containerd://1.7.10
eksa-control-03   Ready    control-plane   79m    v1.27.11   10.20.22.227   <none>        Ubuntu 20.04.6 LTS   5.4.0-173-generic   containerd://1.7.10
eksa-control-04   Ready    control-plane   102m   v1.27.11   10.20.22.226   <none>        Ubuntu 20.04.6 LTS   5.4.0-173-generic   containerd://1.7.10

armada@admin-machine2:~/eksa/mgmt02$ kubectl get machines.cluster.x-k8s.io -A -o wide
NAMESPACE     NAME           CLUSTER   NODENAME          PROVIDERID                                 PHASE     AGE    VERSION
eksa-system   mgmt02-4vwpn   mgmt02    eksa-control-04   tinkerbell://eksa-system/eksa-control-04   Running   127m   v1.27.7-eks-1-27-15
eksa-system   mgmt02-q278f   mgmt02    eksa-control-02   tinkerbell://eksa-system/eksa-control-02   Running   78m    v1.27.7-eks-1-27-15
eksa-system   mgmt02-z9j6w   mgmt02    eksa-control-03   tinkerbell://eksa-system/eksa-control-03   Running   99m    v1.27.7-eks-1-27-15

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • EKS Anywhere Release: v0.18.2
  • EKS Distro Release: 1.26/1.27

ygao-armada avatar Mar 11 '24 18:03 ygao-armada

I have the exact same issue... is there any update.. my nodes are upgraded... but the reboot action (last action) in one of the nodes workflow reports as STATE_RUNNING and hence the entire workflow for that node is also reporting as STATE_RUNNING. Why would that happen.. as the node is alreadyshowing upgraded with new image

thecloudgarage avatar Aug 05 '24 14:08 thecloudgarage