eks-anywhere icon indicating copy to clipboard operation
eks-anywhere copied to clipboard

1st CP node always fails `MachineHealthCheck` when creating a Bottlerocket workload cluster

Open abhinavmpandey08 opened this issue 3 years ago • 0 comments
trafficstars

What happened: When creating a 3 control plane Bottlerocket EKS Anywhere cluster from an existing management, the first control plane node always fails because of MachineHealthCheck failure and gets deleted and re-provisioned. Once the node gets re-provisioned, the cluster is in a proper functioning state but it is still strange that the first control plane node always fails. Another strange thing is that this happens right about when EKS-A reports the the cluster is successfully created. We tried this several times and saw the same behavior over and over.

How to reproduce it (as minimally and precisely as possible): Create a Bottlerocket management cluster. Using this management cluster, attempt to create a 3 control plane Bottlerocket workload cluster and wait till the CLI finishes. Once it's done, check the machine status on the management cluster. It should show one node as deleting. I also had AWS IAM Authenticator deployed on both the management and workload cluster for this test. Not sure whether this matters or not but wanted to mention it.

Environment:

  • EKS Anywhere Release: v0.11.1
  • EKS Distro Release: v1.23

abhinavmpandey08 avatar Sep 13 '22 22:09 abhinavmpandey08