amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Unable to allocate IPs

Open DerekTBrown opened this issue 3 years ago • 10 comments

We are using the cni plugin as part of our Kubernetes deployment. We are seeing an issue where the VPC/ENI have adequate IP addresses, but cni is returning an error indicating that no IP addresses are available:

https://gist.github.com/DerekTBrown/9fd4f5032102afef66e26e18cd918105

Would you be able to help troubleshoot?

DerekTBrown avatar Aug 04 '22 22:08 DerekTBrown

Could you run eks-log-collector and send your logs to [email protected]

cgchinmay avatar Aug 04 '22 23:08 cgchinmay

@DerekTBrown - Sorry for the delay. Do you have a pod churn or short lived pods in your cluster? Since I do see you have warm ip target set to 2 and since there is a pod churn the IPs are getting incrementally added which is not fast enough for the churn leading to IP address assignment error.

"msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 12"}

Recommendation with pod churn is to use warm eni target -

To be clear, only set WARM_IP_TARGET for small clusters, or clusters with very low pod churn. It's also advised to set MINIMUM_IP_TARGET slightly higher than the expected number of pods you plan to run on each node.

Ref : https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md.

Please let me know if you need additional help and we can get on a call.

jayanthvn avatar Aug 09 '22 17:08 jayanthvn

@jayanthvn would setting ENABLE_PREFIX_DELEGATION=True help us with this situation as well (assuming we are allocating a sufficient prefix)? I am trying to understand the tradeoffs between ENABLE_PREFIX_DELEGATION=False and ENABLE_PREFIX_DELEGATION=True.

DerekTBrown avatar Aug 11 '22 22:08 DerekTBrown

@DerekTBrown Yes enabling prefix delegation will mitigate it to some extent. For example, if WARM IP TARGET = 2 and we have a single prefix with 14 used IPs attached to the ENI, we will have 2 IPs free which satisfies the WARM IP TARGET of 2.

Now if there is pod churn and ENI has more space to attach prefixes, 2 pods will get the IPs but the remaining pods will need a new prefix and that will add slight delay but latency is better than an ENI getting attached.

But the reported issue will exist if the ENI is at full capacity and a new prefix will need attaching a new ENI.

But as you mentioned - ("assuming we are allocating a sufficient prefix") - Prefix delegation with WARM PREFIX TARGET set will be faster and pod density will be higher per node.

jayanthvn avatar Aug 15 '22 21:08 jayanthvn

@jayanthvn related; we install CNI via a Terraform aws_eks_addon. Is there a best practice way to set environment variables (ex. WARM_IP_TARGET) using these?

The approach I have been able to find involves a local command, which seems hacky:

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1385#issuecomment-938663671

DerekTBrown avatar Aug 16 '22 20:08 DerekTBrown

@jayanthvn this doesn't appear to fix the issue. We still see failure to assign IPs with tons of available IPs:

{"level":"debug","ts":"2022-08-23T02:54:21.424Z","caller":"networkutils/network.go:280","msg":"Trying to find primary interface that has mac : 02:b8:0d:41:f4:0d"}
{"level":"debug","ts":"2022-08-23T02:54:21.425Z","caller":"networkutils/network.go:280","msg":"Discovered interface: lo, mac: "}
{"level":"debug","ts":"2022-08-23T02:54:21.425Z","caller":"networkutils/network.go:280","msg":"Discovered interface: eth0, mac: 02:b8:0d:41:f4:0d"}
"ipamd.log" 39753L, 8220650B
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.52.34/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.52.34/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.35.87/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.35.87/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.27.66/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.100.27.66/ffffffff"}
{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"AssignPodIPv4Address: ENI eni-013da4447235cebaa does not have available addresses"}
{"level":"error","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:713","msg":"DataStore has no available IP/Prefix addresses"}
{"level":"info","ts":"2022-08-23T03:12:15.395Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: , DeviceNumber: -1, err: assignPodIPv4AddressUnsafe: no available IP/Prefix addresses"}
{"level":"info","ts":"2022-08-23T03:12:15.404Z","caller":"rpc/rpc.pb.go:731","msg":"Received DelNetwork for Sandbox d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db"}
{"level":"debug","ts":"2022-08-23T03:12:15.404Z","caller":"rpc/rpc.pb.go:731","msg":"DelNetworkRequest: K8S_POD_NAME:\"github-actions-runner-lacework-dev-custom-image-rghq5-5xqgk\" K8S_POD_NAMESPACE:\"lacework-dev-actions-runner-system\" K8S_POD_INFRA_CONTAINER_ID:\"d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db\" Reason:\"PodDeleted\" ContainerID:\"d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db\" IfName:\"eth0\" NetworkName:\"aws-cni\""}
{"level":"debug","ts":"2022-08-23T03:12:15.404Z","caller":"ipamd/rpc_handler.go:226","msg":"UnassignPodIPAddress: IP address pool stats: total:16, assigned 11, sandbox aws-cni/d7b37b1e8e0b6086a521050431320cc258517df3459927292d81b5222dee85db/eth0"}

DerekTBrown avatar Aug 23 '22 03:08 DerekTBrown

@DerekTBrown Are you available on K8s slack? we can setup a call and discuss on the configs.

jayanthvn avatar Aug 23 '22 05:08 jayanthvn

I think I have found the root cause. This little line probably should have stood out as odd:

{"level":"debug","ts":"2022-08-23T03:12:15.395Z","caller":"datastore/data_store.go:790","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.100.27.66/ffffffff"}

Disabling IPv6 networking in the container seems to have fixed the issue.

DerekTBrown avatar Aug 23 '22 18:08 DerekTBrown

@jayanthvn our EKS cluster is configured to be IPv4-only. Any ideas why this could be happening?

The container itself is just Ubuntu 22.

DerekTBrown avatar Aug 24 '22 00:08 DerekTBrown

I have pinged you on K8s slack will setup a time to discuss.

jayanthvn avatar Aug 24 '22 01:08 jayanthvn

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Oct 24 '22 00:10 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Nov 08 '22 00:11 github-actions[bot]