amazon-vpc-cni-k8s ip addresses leaking when there are too many ip in cooldown pool

trafficstars

What happened:

these are my ipadm's warm pool settings:

WARM_IP_TARGET = 2
MINIMUM_IP_TARGET = 23
WARM_ENI_TARGET = 0
IP_COOLDOWN_PERIOD = 30

during pressure test, we create a lot of Pods in a short time and just delete them when they became ready. this operation make a lot of ip addresses go into cool-down state, and new created pod can not be assigned an IP until the IP_COOLDOWN_PERIOD exceeded.

Attach logs

Kubelet log: new pod can't assign IP address:

IPadm log: you can see, more and more cool down ip present, but the short never > 0:

when cool down ip = 4, and assigned ip = 19, we have no free ip in ipadm pool, but 23(MINIMUM_IP_TARGET) -19(assigned ip) > 2 (WARM_IP_TARGET), ipadm will not allocated more free IP from VPC:

after IP_COOLDOWN_PERIOD , new pod can be assigned an IP:

after reading the source code here:

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/datastore/data_store.go#L787
https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/ipamd.go#L1831

I found that the vpc-cni plugin calculate as follow:

and AvailableAddress() implement here, just total attached ips - assigned ips:

This is fine most of the time. But if there are enough ip in cool down state, suppose cool down cnt + assigned cnt = total attached cnt, there still no free ip can be assign to Pod, because "total - assigned" > "WARM_IP_TARGET", so ipadm won't try to allocate more ip from VPC. Finally, the new created pod can not get a ip address even there r so many free ip in the VPC.

What you expected to happen:

pod should be assigned an IP when the VPC subnet has enough IP.

How to reproduce it (as minimally and precisely as possible):

create a k8s cluster
use just one node
configure vpc-cni with WARM_IP_TARGET=1, MINIMUM_IP_TARGET=6 (small than VPC free IP Count)
create pod one by one, to MINIMUM_IP_TARGET - WARM_IP_TARGET (5)
do these within 30s: delete one pod and create two new pod, the last pod will fail by can not assign IP address to pod

Anything else we need to know?:

Environment: aws eks

Kubernetes version (use kubectl version): 1.29
CNI Version: 1.17

Apr 26 '24 14:04 abbshr

do these within 30s: delete one pod and create two new pod, the last pod will fail by can not assign IP address to pod

The default cooldown period is 30 secs,after which the ips in the cooldown will be made available to the pods. Are you noticing that this isn't happening?

The reason for keeping the warm_ip_target is to ensure that VPC CNI doesn't do an unnecessary EC2 api call, which is avoid running to API throttles.

If tuning of ip cooldown period is preferable, it can be controlled using this flag - https://github.com/aws/amazon-vpc-cni-k8s/?tab=readme-ov-file#ip_cooldown_period-v1150

May 01 '24 20:05 orsenthil

I mean during the cooldown period, pod can't get an ip even if there are free ips in vpc subnet, which I think is not being expected.

This happens because total ip count - assigned ip count < warm_ip_target, and assigned ip count + cooldown ip count = total ip count.

Senthil Kumaran @.***> 于 2024年5月2日周四 04:16写道：

do these within 30s: delete one pod and create two new pod, the last pod will fail by can not assign IP address to pod

The default cooldown period is 30 secs,after which the ips in the cooldown will be made available to the pods. Are you noticing that this isn't happening?

The reason for keeping the warm_ip_target is to ensure that VPC CNI doesn't do an unnecessary EC2 api call, which is avoid running to API throttles.

If tuning of ip cooldown period is preferable, it can be controlled using this flag - https://github.com/aws/amazon-vpc-cni-k8s/?tab=readme-ov-file#ip_cooldown_period-v1150

— Reply to this email directly, view it on GitHub https://github.com/aws/amazon-vpc-cni-k8s/issues/2896#issuecomment-2089065722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXJWS7VYXAQ4BONAZEOOB3ZAFERBAVCNFSM6AAAAABG23UCQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZGA3DKNZSGI . You are receiving this because you authored the thread.Message ID: @.***>

May 02 '24 11:05 abbshr

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

Aug 26 '24 00:08 github-actions[bot]

Issue closed due to inactivity.

Sep 09 '24 00:09 github-actions[bot]

amazon-vpc-cni-k8s amazon-vpc-cni-k8s copied to clipboard

ip addresses leaking when there are too many ip in cooldown pool

amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard