cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

MachinePool remains in WaitingForReplicasReady because CAPA does not reconcile node references after instance refresh

Open AndiDog opened this issue 2 years ago • 11 comments
trafficstars

/kind bug

What steps did you take and what happened:

Related to https://github.com/kubernetes-sigs/cluster-api/issues/8858, https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4071

CAPA's AWSMachinePool reconciler unconditionally returns return ctrl.Result{}, r.reconcileNormal(ctx, machinePoolScope, infraScope, infraScope), i.e. does not schedule reconciliation of the ASG's EC2 instances into .Status.Instances at regular intervals.

I made a change where CAPA triggers an instance refresh (e.g. change of AMI IDs), rolling out new EC2 instances. The parent MachinePool object remained in non-ready state with reason WaitingForReplicasReady, with CAPI continuously logging NodeRefs != ReadyReplicas messages. Only the next, random reconciliation of my AWSMachinePool object solves this by checking which instances exist in the ASG.

What did you expect to happen:

CAPA should regularly reconcile in order to check the ASG for a changed set of instances. Particularly if it's expected because CAPA triggered an instance refresh.

Environment:

  • Cluster-api-provider-aws version: v2.2.4 + preliminary fixes from https://github.com/kubernetes-sigs/cluster-api/issues/8858 which trigger instance refresh on change of the bootstrap config reference
  • Kubernetes version: (use kubectl version): v1.24.14
  • OS (e.g. from /etc/os-release): Flatcar Linux

AndiDog avatar Nov 06 '23 18:11 AndiDog