amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Unable to allocate IPs
We are using the cni plugin as part of our Kubernetes deployment. We are seeing an issue where the VPC/ENI have adequate IP addresses, but cni is returning an error indicating that no IP addresses are available:
https://gist.github.com/DerekTBrown/9fd4f5032102afef66e26e18cd918105
Would you be able to help troubleshoot?
Could you run eks-log-collector and send your logs to [email protected]
@DerekTBrown - Sorry for the delay. Do you have a pod churn or short lived pods in your cluster? Since I do see you have warm ip target set to 2 and since there is a pod churn the IPs are getting incrementally added which is not fast enough for the churn leading to IP address assignment error.
"msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 12"}
Recommendation with pod churn is to use warm eni target -
To be clear, only set WARM_IP_TARGET for small clusters, or clusters with very low pod churn. It's also advised to set MINIMUM_IP_TARGET slightly higher than the expected number of pods you plan to run on each node.
Ref : https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md.
Please let me know if you need additional help and we can get on a call.
@jayanthvn would setting ENABLE_PREFIX_DELEGATION=True help us with this situation as well (assuming we are allocating a sufficient prefix)? I am trying to understand the tradeoffs between ENABLE_PREFIX_DELEGATION=False and ENABLE_PREFIX_DELEGATION=True.
@DerekTBrown Yes enabling prefix delegation will mitigate it to some extent. For example, if WARM IP TARGET = 2 and we have a single prefix with 14 used IPs attached to the ENI, we will have 2 IPs free which satisfies the WARM IP TARGET of 2.
Now if there is pod churn and ENI has more space to attach prefixes, 2 pods will get the IPs but the remaining pods will need a new prefix and that will add slight delay but latency is better than an ENI getting attached.
But the reported issue will exist if the ENI is at full capacity and a new prefix will need attaching a new ENI.
But as you mentioned - ("assuming we are allocating a sufficient prefix") - Prefix delegation with WARM PREFIX TARGET set will be faster and pod density will be higher per node.
@jayanthvn related; we install CNI via a Terraform aws_eks_addon. Is there a best practice way to set environment variables (ex. WARM_IP_TARGET) using these?
The approach I have been able to find involves a local command, which seems hacky:
https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1385#issuecomment-938663671
@jayanthvn this doesn't appear to fix the issue. We still see failure to assign IPs with tons of available IPs:
{"level":"debug","ts":"2022-08-23T02:54:21.424Z","caller":"networkutils/network.go:280","msg":"Trying to find primary interface that has mac : 02:b8:0d:41:f4:0d"}
{"level":"debug","ts":"2022-08-23T02:54:21.425Z","caller":"networkutils/network.go:280","msg":"Discovered interface: lo, mac: "}
{"level":"debug","ts":"2022-08-23T02:54:21.425Z","caller":"networkutils/network.go:280","msg":"Discovered interface: eth0, mac: 02:b8:0d:41:f4:0d"}
"ipamd.log" 39753L, 8220650B
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.52.34/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.52.34/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.35.87/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.35.87/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.27.66/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.27.66/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"AssignPodIPv4Address: ENI eni-013da4447235cebaa does not have available addresses"}
{"level":"error","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"DataStore has no available IP/Prefix addresses"}
{"level":"info","ts":"2022-08-23T03:12:15.395Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: , DeviceNumber: -1, err: assignPodIPv4AddressUnsafe: no available IP/Prefix addresses"}
{"level":"info","ts":"2022-08-23T03:12:15.404Z","caller":"rpc/rpc.pb.go:731","msg":"Received DelNetwork for Sandbox d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db"}
{"level":"debug","ts":"2022-08-23T03:12:15.404Z","caller":"rpc/rpc.pb.go:731","msg":"DelNetworkRequest: K8S_POD_NAME:\"github-actions-runner-lacework-dev-custom-image-rghq5-5xqgk\" K8S_POD_NAMESPACE:\"lacework-dev-actions-runner-system\" K8S_POD_INFRA_CONTAINER_ID:\"d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db\" Reason:\"PodDeleted\" ContainerID:\"d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db\" IfName:\"eth0\" NetworkName:\"aws-cni\""}
{"level":"debug","ts":"2022-08-23T03:12:15.404Z","caller":"ipamd/rpc_handler.go:226","msg":"UnassignPodIPAddress: IP address pool stats: total:16, assigned 11, sandbox aws-cni/d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db/eth0"}
@DerekTBrown Are you available on K8s slack? we can setup a call and discuss on the configs.
I think I have found the root cause. This little line probably should have stood out as odd:
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.27.66/ffffffff"}
Disabling IPv6 networking in the container seems to have fixed the issue.
@jayanthvn our EKS cluster is configured to be IPv4-only. Any ideas why this could be happening?
The container itself is just Ubuntu 22.
I have pinged you on K8s slack will setup a time to discuss.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.