cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Deleting a cluster with unmanaged subnets doesn't clean up the tags in the Subnet AWS Resource resulting into clusters being unable to be provisioned
/kind bug
What steps did you take and what happened: I have a usecase where I need to create and delete 10s of clusters with unmanaged subnets every day and one day I noticed that my clusters stopped being able to be provisioned with the following error:
E0111 15:03:44.323375 1 controller.go:317] controller/awsmachine "msg"="Reconciler error" "error"="failed to create AWSMachine instance: failed to run machine \"aws-v1.22-763b6751-c803-48bb-bb09-cc57b5c450f0-control-plamntff\", no subnets available in availability zone \"eu-west-2b\"" "name"="aws-v1.22-763b6751-c803-48bb-bb09-cc57b5c450f0-control-plamntff" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"
Which didn't make sense because the subnets were all working fine. After further investigation I realized that CAPI was attempting to create tags on the Subnet resources and that request was failing because the tag limit of 50 was reached. I only had 1 cluster provisioned at that time and the tags on the Subnet included all the past clusters that I had created. Here's an example from today:

The AWSCluster object also included the tags of the deleted clusters in the .spec.network.subnets stanza:
cidrBlock: ******
id: ******
isPublic: false
routeTableId: ******
tags:
Environment: shared_services
Management: Terraform
Name: Testing Private Subnet 2
Repo: shared-infrastructure
Scope: Private
TF_Directory: shared_services_network
kubernetes.io/cluster/1.23-243ebba1-250e-4261-ae37-1453024e0c8c: shared
kubernetes.io/cluster/02b3df93-c93e-4b39-a12b-75765383a5a9: shared
kubernetes.io/cluster/aws-v1.22-1f2985bc-dfeb-4eb7-8e99-a0d3bbc400d0: shared
kubernetes.io/cluster/aws-v1.22-7d2ebe0d-42e3-49f5-a74b-e1f3b222954d: shared
kubernetes.io/cluster/aws-v1.22-9926f379-2c26-4840-aea3-9305beabfa72: shared
kubernetes.io/cluster/fa3b783c-a611-40b0-95f9-7116e356e999: shared
kubernetes.io/cluster/lewis-test-cluster: shared
kubernetes.io/cluster/lewistest: shared
kubernetes.io/cluster/lewistest2: shared
kubernetes.io/cluster/lewistest3: shared
kubernetes.io/cluster/lewistest4: shared
kubernetes.io/role/internal-elb: "1"
My next thought was deleting the tags on the Subnets manually, but when Reconciling subnets was triggered by the capa-controller-manager, the tags were readded to the Subnets. Deleting them from the AWSCluster object using kubectl edit also didn't work because they were readded the same way. The only way I managed to delete the tags was by deleting them from both the subnets and the AWSCluster object in quick succession before the Reconciling subnets got triggered again.
Here's my recreation steps:
- Create a cluster (cluster A) with unmanaged subnets.
- Without deleting cluster A, create a new one (cluster B).
- The cluster B's AWSCluster object now includes the subnet tags from both clusters, as will the Subnet AWS Resources.
- Delete cluster A.
- Without deleting cluster B, create a new one (cluster C).
- Cluster C now includes subnet tags from clusters A, B and C, despite A already been deleted, as will the Subnet AWS Resources.
- Continue the pattern 47 more times until no more clusters can be provisioned anymore because the tag limit has been reached.
What did you expect to happen:
- A new AWSCluster object shouldn't sync the subnet tags of another AWSCluster from the Subnet AWS Resource. It should only sync the Subnet Resource's generic tags and its own.
- When deleting a cluster, I expect the subnet tags to be deleted from the Subnet AWS Resources as well.
Anything else you would like to add: My temporary solution to this is creating a lambda that deletes the tags on the Subnet Resources and the AWSCluster object in quick succession every few hours, but that is far from ideal.
Environment:
- Cluster-api-provider-aws version: v1.1.0
- Kubernetes version: (use
kubectl version): v1.21.5 - OS (e.g. from
/etc/os-release): Amazon Linux EKS
Strange, CAPA does not share tags between clusters. Can you provide the yamls used for creating cluster/lewistest and cluster/lewistest2? I just want to see how they are created to understand how this happened.
Hi @sedefsavas, thank you for looking into this!
I clarified my original post a bit, CAPA indeed doesn't share tags between clusters directly, but it syncs their tags with the Subnet AWS Resource which in turn shares them indirectly.
So if a cluster (A) is created, the new cluster tag (A=shared) is added to the Subnet Resource. Then the cluster will sync the rest of the Subnet tags (tags like Scope=Private etc) when Reconcile subnets gets triggered. Then a new cluster (B) gets created and its cluster tag (B=shared) is added to the Subnet Resource, when 'Reconcile subnets' gets triggered again, both clusters will sync the Subnet's tags with their own which at that point also includes the tags of the other cluster. Resulting in the Subnet and the 2 clusters having both of the tags: A=shared and B=shared.
I believe that when a cluster is syncing tags with the Subnet Resource, it should ignore tags from other CAPI created clusters to avoid the tag inheritance issue.
Then if a cluster gets deleted, it should remove its tags from the Subnet Resource.
If both of the above functionalities are added, the bug would be fixed.
The yamls are templated so I will send you the template:
# {{ .Uuid }}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: aws-v1.22-{{ .Uuid }}
namespace: {{ .Namespace }}
spec:
clusterNetwork:
pods:
cidrBlocks:
- ******
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: aws-v1.22-{{ .Uuid }}-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: aws-v1.22-{{ .Uuid }}
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
name: aws-v1.22-{{ .Uuid }}
namespace: {{ .Namespace }}
spec:
network:
vpc:
id: *******
subnets:
- id: *******
- id: *******
- id: *******
securityGroupOverrides:
bastion: *******
controlplane: *******
apiserver-lb: *******
node: *******
lb: *******
region: eu-west-2
sshKeyName: *******
controlPlaneLoadBalancer:
scheme: internal
subnets:
- *******
- *******
- *******
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: aws-v1.22-{{ .Uuid }}-control-plane
namespace: {{ .Namespace }}
spec:
kubeadmConfigSpec:
postKubeadmCommands:
- sudo kubectl --kubeconfig=/etc/kubernetes/kubelet.conf apply -f https://docs.projectcalico.org/v3.20/manifests/calico.yaml
clusterConfiguration:
apiServer:
extraArgs:
cloud-provider: aws
controllerManager:
extraArgs:
cloud-provider: aws
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: aws-v1.22-{{ .Uuid }}-control-plane
replicas: 3
version: v1.22.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
name: aws-v1.22-{{ .Uuid }}-control-plane
namespace: {{ .Namespace }}
spec:
template:
spec:
iamInstanceProfile: *******
instanceType: t3.large
sshKeyName: *******
failureDomain: "eu-west-2a"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: aws-v1.22-{{ .Uuid }}-md-0
namespace: {{ .Namespace }}
spec:
clusterName: aws-v1.22-{{ .Uuid }}
replicas: 3
selector:
matchLabels: null
template:
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: aws-v1.22-{{ .Uuid }}-md-0
clusterName: aws-v1.22-{{ .Uuid }}
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: aws-v1.22-{{ .Uuid }}-md-0
version: v1.22.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
name: aws-v1.22-{{ .Uuid }}-md-0
namespace: {{ .Namespace }}
spec:
template:
spec:
iamInstanceProfile: *******
instanceType: t3.large
sshKeyName: *******
failureDomain: "eu-west-2a"
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: aws-v1.22-{{ .Uuid }}-md-0
namespace: {{ .Namespace }}
spec:
template:
spec:
preKubeadmCommands:
- sudo apt -y update
- sudo apt -y install linux-modules-extra-$(uname -r)
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{"{{"}} ds.meta_data.local_hostname {{"}}"}}'
Now makes sense, thanks for giving more info. To summarize, when using your own VPC and bringing up clusters on the same VPC, the subnet tags do not get cleaned up after clusters are deleted.
/triage accepted /priority important-soon
/assign
Waiting for #3123 to be merged.
Hi, is there any update about the above issue? Is it possible for me to help with the PR in any way? :slightly_smiling_face:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
This error is still relevant to my interests as well.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The issue has been marked as an important bug and triaged. Such issues are automatically marked as frozen when hitting the rotten state to avoid missing important bugs.
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle frozen
This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Deprioritize it with
/priority important-longtermor/priority backlog - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted