cluster-api-provider-aws MachinePool remains in WaitingForReplicasReady because CAPA does not reconcile node references after instance refresh

MachinePool remains in WaitingForReplicasReady because CAPA does not reconcile node references after instance refresh

Open AndiDog opened this issue 2 years ago • 11 comments

trafficstars

/kind bug

What steps did you take and what happened:

Related to https://github.com/kubernetes-sigs/cluster-api/issues/8858, https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/4071

CAPA's AWSMachinePool reconciler unconditionally returns return ctrl.Result{}, r.reconcileNormal(ctx, machinePoolScope, infraScope, infraScope), i.e. does not schedule reconciliation of the ASG's EC2 instances into .Status.Instances at regular intervals.

I made a change where CAPA triggers an instance refresh (e.g. change of AMI IDs), rolling out new EC2 instances. The parent MachinePool object remained in non-ready state with reason WaitingForReplicasReady, with CAPI continuously logging NodeRefs != ReadyReplicas messages. Only the next, random reconciliation of my AWSMachinePool object solves this by checking which instances exist in the ASG.

What did you expect to happen:

CAPA should regularly reconcile in order to check the ASG for a changed set of instances. Particularly if it's expected because CAPA triggered an instance refresh.

Environment:

Cluster-api-provider-aws version: v2.2.4 + preliminary fixes from https://github.com/kubernetes-sigs/cluster-api/issues/8858 which trigger instance refresh on change of the bootstrap config reference
Kubernetes version: (use kubectl version): v1.24.14
OS (e.g. from /etc/os-release): Flatcar Linux

Nov 06 '23 18:11 AndiDog

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

MachinePool remains in WaitingForReplicasReady because CAPA does not reconcile node references after instance refresh

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard