cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
EKS VPC CNI cannot be disabled because AWS now installs via Helm
/kind bug
What steps did you take and what happened:
Creating an EKS cluster with a custom CNI (such as Cilium) is currently problematic because CAPA does not correctly remove the automatically preinstalled VPC CNI. That can be seen by aws-node pods running on the cluster.
CAPA has code to delete the AWS VPC CNI resources if AWSManagedControlPlane.spec.vpcCni.disable=true, but only deletes if they're not managed by Helm. I presume that is on purpose so that users of CAPA could deploy the VPC CNI with Helm in their own way.
https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/3618d1c1567afe72781059cd9a3498e7ae44b3b5/pkg/cloud/services/awsnode/cni.go#L269-L293
Unfortunately, it seems that AWS introduced a breaking change by switching their own automagic deployment method to Helm, including the relevant labels. This is what a newly-created EKS cluster looks like (VPC CNI not disabled, cluster created by CAPA ~v2.3.0):
$ kubectl get ds -n kube-system aws-node -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2024-01-18T15:40:42Z"
generation: 1
labels:
app.kubernetes.io/instance: aws-vpc-cni
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: aws-node
app.kubernetes.io/version: v1.15.1
helm.sh/chart: aws-vpc-cni-1.15.1
k8s-app: aws-node
name: aws-node
namespace: kube-system
# [...]
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: aws-node
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: aws-vpc-cni
app.kubernetes.io/name: aws-node
k8s-app: aws-node
The deletion code must be fixed. Sadly, AWS does not provide extra labels to denote that the deployment is AWS-managed. And this breaking change even applies to older Kubernetes versions like 1.24.
Related: E2E test wanted to cover this feature (issue)
Environment:
- Cluster-api-provider-aws version: ~v2.3.0 (fork with some backports)
- Kubernetes version: (use
kubectl version): 1.24