CFP: Update Cilium Helm install docs for EKS and the AWS VPC CNI
Cilium Feature Proposal
Is your proposed feature related to a problem?
The documentation for installing CIlium in EKS with Helm currently recommends patching the VPC CNI with kubectl to enable Cilium to manage ENIs instead of the VPC CNI. While this does work, it adds a manual step that prevents bootstrapping a Cilium EKS cluster using Terraform or eksctl.
# Relevant code
kubectl -n kube-system patch daemonset aws-node --type='strategic' -p='{"spec":{"template":{"spec":{"nodeSelector":{"io.cilium/aws-node-enabled":"true"}}}}}'
Describe the feature you'd like
Please update the docs to instead recommend using addon configuration values to patch the vpc-cni at the time it's deployed. Please note that nodeSelector is not a value that can be configured, so instead, affinity must be used.
The VPC CNI can be configured to not run on Cilium managed nodes using the following configuration values:
{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"io.cilium/aws-node-enabled","operator":"In","values":["true"]}]}]}}}}
Sounds like this would be quite helpful, next step would be creating a concerete PR proposal.
Hi @caleb-devops , thanks for the tip but when I put this configuration prior to cilium install the coredns addon doesn't start. (Obviously because no CNI are found).
Hi @Smana. CoreDNS requires that the CNI is deployed, so with the vpc-cni configuration values in place, Cilium will need to be installed before CoreDNS can run. The recommended node taint should prevent other pods (like coredns) from being scheduled on the node until Cilium is deployed.
taints:
- key: "node.cilium.io/agent-not-ready"
value: "true"
effect: "NoExecute"
Thx @caleb-devops , Actually I already have a toleration. However the cilium install only starts after the EKS module deployment is finished (including CoreDNS which is an EKS addon).
@Smana you don't need to add the toleration to CoreDNS. Because CoreDNS relies on the CNI, it will need to be deployed after Cilium is installed. For the terraform-aws-modules/eks/aws module, try the following:
-
Set vpc-cni configuration_values in the
terraform-aws-modules/eks/awsmodulecluster_addons = { vpc-cni = { most_recent = true before_compute = true configuration_values = jsonencode({ affinity = { nodeAffinity = { requiredDuringSchedulingIgnoredDuringExecution = { nodeSelectorTerms = [{ matchExpressions = [{ key = "io.cilium/aws-node-enabled" operator = "In" values = ["true"] }] }] } } } }) } } -
Install the Cilium Helm chart using the Terraform Helm provider
-
Install remaining addons (I use the terraform-aws-eks-blueprints-addons module for this)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
The AWS EKS team will be adding an option to initialize a bare EKS cluster (without any addons) through https://github.com/aws/containers-roadmap/issues/923. After they do, it should no longer be necessary to patch the VPC CNI to disable it.
EKS clusters can now be created without any addons: https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-eks-cluster-creation-flexibility-networking-add-ons/
@caleb-devops may I know your eventual script to setup eks together with cilium in one go?
@caleb-devops may I know your eventual script to setup eks together with cilium in one go?
@caleb-devops I am very interested in this too, I'm deploying a bare EKS cluster and there's some very strange order-of-eventing going on with coreDNS refusing to become healthy (and thus the nodes stall out in not ready state)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has not seen any activity since it was marked stale. Closing.