amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Multus with VPC-CNI as secondary: failed to add default route: file exists
What happened: Attempting to use multus with EKS. Primary CNI is Cilium and VPC-CNI as a secondary CNI on a pod. The pod fails to start with errors related to VPC-CNI failing to set the default route because Cilium has already set this route.
Attach logs Multus Logs:
2021-09-30T23:58:36Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:38Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:40Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:42Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:44Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:46Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
2021-09-30T23:58:48Z [error] [kube-system/metrics-server-679f88554f-54kzx:aws-cni]: error adding container to network "aws-cni": add command: failed to setup network: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists
VPC-CNI plugin Logs:
{"level":"info","ts":"2021-09-30T23:59:45.130Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received CNI add request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2021-09-30T23:59:45.130Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2021-09-30T23:59:45.133Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received add network response for container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e interface net1: Success:true IPv4Addr:\"10.128.8.86\" DeviceNumber:1 VPCcidrs:\"10.128.8.0/22\" "}
{"level":"debug","ts":"2021-09-30T23:59:45.133Z","caller":"routed-eni-cni-plugin/cni.go:188","msg":"SetupNS: hostVethName=eniacb1d4b899f, contVethName=net1, netnsPath=/proc/909/ns/net, deviceNumber=1, mtu=9001"}
{"level":"error","ts":"2021-09-30T23:59:45.134Z","caller":"driver/driver.go:185","msg":"Failed to setup veth network setup NS network: failed to add default route: file exists"}
{"level":"error","ts":"2021-09-30T23:59:45.135Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Failed SetupPodNetwork for container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e: setupNS network: failed to setup veth pair.: setupVeth network: failed to setup veth network: setup NS network: failed to add default route: file exists"}
{"level":"info","ts":"2021-09-30T23:59:45.148Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:45.150Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}
{"level":"info","ts":"2021-09-30T23:59:45.279Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns(/proc/909/ns/net) IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:45.281Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}
{"level":"info","ts":"2021-09-30T23:59:46.333Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Received CNI del request: ContainerID(1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Netns() IfName(net1) Args(IgnoreUnknown=true;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=metrics-server-679f88554f-54kzx;K8S_POD_INFRA_CONTAINER_ID=1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e) Path(/opt/cni/bin:/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"Debug\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2021-09-30T23:59:46.336Z","caller":"routed-eni-cni-plugin/cni.go:240","msg":"Container 1204b38c5c88bd848a11138ec8ce0e91c2bdcc5fbb2e3d3a53ac7486b88f3f6e not found"}
What you expected to happen: VPC-CNI should be able to run as the secondary CNI in a Multus configuration, especially with Multus support being advertised as a feature.
How to reproduce it (as minimally and precisely as possible): Install Cilium, Multus and VPC-CNI. Multus should use the following args:
"--multus-conf-file=auto",
"--cni-version=0.3.1",
"--multus-master-cni-file-name=05-cilium.conflist",
"--multus-log-level=error",
"--multus-log-file=/var/log/aws-routed-eni/multus.log"
Add a NetworkAttachmentDefinition with the following spec:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: vpccni
namespace: kube-system
spec:
config: '{
"cniVersion": "0.3.1",
"name": "aws-cni",
"plugins": [
{
"name": "aws-cni",
"type": "aws-cni",
"vethPrefix": "eni",
"mtu": "9001",
"pluginLogFile": "/var/log/aws-routed-eni/plugin.log",
"pluginLogLevel": "Debug"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}'
This should result in pods running with Cilium as the default CNI and vpccni being available for additional interfaces.
Add the following annotation to a pod: k8s.v1.cni.cncf.io/networks: vpccni
Anything else we need to know?: The problem appears to lie here: https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.1/cmd/routed-eni-cni-plugin/driver/driver.go#L147 It appears that the VPC-CNI is not capable of handling the case where the default route already exists. In general, a CNI plugin should handle this case.
Environment: EKS Version: 1.21 Multus version: v3.8 Cilium Version: 1.10.4 VPC-CNI Version: v1.9.1-eksbuild.1 (installed via EKS Addons)
- OS (e.g:
cat /etc/os-release): Amazon Linux 2 - Kernel (e.g.
uname -a): 5.4.141-67.229.amzn2.x86_64
Hi @cryptk thanks for reporting, let me try to repro using the steps mentioned and will get back to you.
I was looking for some official guidance on how the default route should be handled when running a multi-net setup, and the official spec addresses this.
https://github.com/k8snetworkplumbingwg/multi-net-spec/tree/master/v1.2
Section 4.1.2.1.9
Typically, it’s assumed that the attachment for the default network will have the default route,
however, in some cases one may desire to specify which attachment will have the default route.
When “default-route” is set for an attachment other than the cluster-wide default network
attachment, it should be noted that the default route and gateway will be cleared from the
cluster-wide default network attachment.
So it looks like the VPC CNI should not replace the original default route, but rather only create it if it does not already exist. That spec had lots of other good information on how a well-behaved CNI should operate in a multi-net configuration.
@cgchinmay thanks for picking the issue up so fast! My repro steps are pretty minimal, so if you need any clarification or expansion, please let me know!
@cryptk There has been exact same issue in the past #596 #203 and it was fixed by this #367 Will check on why are you hitting same issue with this cni version.
@cryptk - Would be great if you can share your use case that drives the requirement for VPC CNI as secondary plugin .
AFAIK, VPC CNI as default delegate CNI was qualified with Multus.
https://docs.aws.amazon.com/eks/latest/userguide/pod-multiple-network-interfaces.html
Only the Amazon VPC CNI plugin is officially supported as the default delegate plugin. You need to modify the published Multus installation manifest to reconfigure the default delegate plugin to an alternate CNI if you choose not to use the Amazon VPC CNI plugin for primary networking.
Doc doesn't explicitly call out that VPC CNI as secondary is not supported. Will confirm internally if this was intended to be supported and respond back.
+1 interested in the use case for using VPC CNI as for secondary interface, regardless of the issue reported.
Hi @cryptk I was able to repro the issue, like you mentioned. However fixing it will be just 1 part of the problem. The current Multus support expects aws-vpc-cni to be used as the Primary plugin. We will be updating our docs to explicitly call it out.
For now I will mark this as a Feature Request instead. It would help to know your use case for using aws-vpc-cni as the secondary plugin.
@sramabad1 @cgchinmay @jungy-aws sorry for the late response, the GitHub notification seemed to never hit me.
The problem I am trying to solve is that when running Cilium as the CNI (to benefit from eBPF as well as all of the other cilium features) and using the Cilium overlay network, the EKS control plane can no longer talk to any of the pods to handle things like validating and mutating webhooks. A resolution for this would be to place those pods on the VPC network via vpc-cni.
Ideally this would involve having the pods still be primarily on the Cilium overlay network and just having a second interface on the VPC network which can then be used for the EKS control plane communications.
+1 for this functionality. We have the same use-case as @cryptk
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
/not stale
@cryptk did you got any progress on this? I have the same use case