cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
AWSMachine controller leaks EC2 instances when specifying a Name tag in `additionalTags`
/kind bug
What steps did you take and what happened:
After applying two infrastructure.cluster.x-k8s.io/v1beta2
AWSMachines with the following additionalTags:
spec:
additionalTags:
Name: sample-name-tag
124 EC2 instances were created in quick succession for these AWSMachines before .spec.providerID
was populated. All of the EC2 instances (leaked and not-leaked) do have the tags Name: sample-name-tag
❯ k get awsmachine
NAME CLUSTER STATE READY INSTANCEID MACHINE
sample-name-tag-251d8688-cjd7r ${ID} running true aws:///us-east-2a/i-019c51ee0ded3e51a sample-name-tag-workers-55b788fd4dx5n4zb-kn2h6
sample-name-tag-251d8688-hwhb4 ${ID} running true aws:///us-east-2a/i-0c9169fb1781d2a8e sample-name-tag-workers-55b788fd4dx5n4zb-8brbs
What did you expect to happen:
One EC2 instance to be created per AWSMachine, with the tag Name: sample-name-tag
Anything else you would like to add: I think this occurs because:
- We try to find an instance during a normal reconcile loop - https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/56c9a39dd834640ee4a027e679ad2a5757098dfd/controllers/awsmachine_controller.go#L483-L489
- We look for the instance with these tags, including the name of the AWSMachine https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/abd444c27083d5054c772c7bf5101266da10f76f/pkg/cloud/services/ec2/instances.go#L46-L53
- If
instance == nil
, then a new EC2 instance is created https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/56c9a39dd834640ee4a027e679ad2a5757098dfd/controllers/awsmachine_controller.go#L517C29-L522 - However, this only happens when the instance cannot be found by tags (or an AWS not found/permission error). So the bug is with a change in v1beta2 to allow overwriting the
Name
tag: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/api/v1beta2/tags.go#L244-L264
vs v1beta1 https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/abd444c27083d5054c772c7bf5101266da10f76f/api/v1beta1/tags.go#L250-L269 where the AWSMachine's own .metadata.name
had precedence
So I think we need to either not allow overwriting the Name
tag or we need to be smarter about how we find the instance based off of tags. Even if we update the filtering logic to account for the additional Name tag, we don't want to find an instance that's already been claimed by another AWSMachine.
Environment:
- Cluster-api-provider-aws version: v2.0.2
- Kubernetes version: (use
kubectl version
):
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.6+f67aeb3
- OS (e.g. from
/etc/os-release
):