eks-anywhere icon indicating copy to clipboard operation
eks-anywhere copied to clipboard

unhealthyMachineTimeout not working when VM is powered off (VM not deleted from disk)

Open saiteja313 opened this issue 1 year ago • 0 comments

What happened:

I have created EKSA Cluster with following configuration,

  1. unhealthyMachineTimeout set to 30 seconds (minimum value) in the Cluster config file Worker node section
  2. Enabled Autoscaling configuration in cluster config file for worker nodes
  3. Installed Cluster Autoscaler curated package on the cluster

I went through two scenarios post cluster creation,

  1. Scenario 1: Navigate to VMWare vSphere console, Click on one of worker node, Right Click and Power Off
  2. Scenario 2: Click on one of worker node, Right Click > Power Off, Right click again > Delete from the disk

Scenario 1 fails all the time. No new node is created. capv pod logs do not show any event that node is unhealthy until 4-5 minutes. And then, node either gets deleted and new node is provisioned or node gets powered on.

Scenario 2 works all the time. Post deletion of node, new node gets provisioned within 30 seconds.

[1] https://anywhere.eks.amazonaws.com/docs/getting-started/optional/healthchecks/#machinehealthcheckunhealthymachinetimeout-optional

What you expected to happen:

For scenario 1, capv should respect unhealthyMachineTimeout 30 seconds value. When unhealthyMachineTimeout is set to 5 minutes, capv takes around 20-40 minutes to realize the node is powered off or not ready.

I am not sure if we need something like a node termination handler that Amazon EKS on cloud has.

How to reproduce it (as minimally and precisely as possible):

  • Configure worker node section of Cluster config file as following.
  workerNodeGroupConfigurations:
  - count: 1
    machineGroupRef:
      kind: VSphereMachineConfig
      name: demo-mgmt
    name: md-0
    autoscalingConfiguration:
      minCount: 1
      maxCount: 5
    machineHealthCheck:        
      unhealthyMachineTimeout: 30s
      maxUnhealthy: 100%

Anything else we need to know?:

Environment: EKSA with vSphere

  • EKS Anywhere Release: 0.20
Version: v0.20.4
Release Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/eks-a/manifest.yaml Bundle Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/74/manifest.yaml 
  • EKS Distro Release: not sure

saiteja313 avatar Sep 17 '24 14:09 saiteja313