amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Count addresses in cooldown against used ones in datastoreTargetState()
datastoreTargetState() calculates short value of ip addresses not taking into account addresses in cooldown.
On nodes with a high churn rate it leads to failures during address allocation (note 2 addresses are both "available" and in "cooldown").
{"level":"debug","ts":"2021-10-20T18:32:16.219Z","caller":"ipamd/ipamd.go:1214","msg":"Current warm IP stats: target: 2, total: 18, assigned: 16, available: 2, cooldown: 2, short: 0, over 0"}
{"level":"info","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"Received AddNetwork for NS /var/run/netns/cni-8497daaf-cab2-7929-ff1c-53f97a295633, Sandbox 2324678184bfbb2d12122195a190c25744cb9cdc195ae0363a63c5f3c98d21f9, ifname eth0"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"1db8f1952a05cac8-wpnnj\" K8S_POD_NAMESPACE:\"blackbox\" K8S_POD_INFRA_CONTAINER_ID:\"2324678184bfbb2d12122195a190c25744cb9cdc195ae0363a63c5f3c98d21f9\" ContainerID:\"2324678184bfbb2d121221
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"AssignIPv4Address: IP address pool stats: total: 18, assigned 16"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:757","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.244.56.173/ffffffff"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.244.56.173/ffffffff"}
...
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"AssignPodIPv4Address: ENI eni-0f2c931402cffbd00 does not have available addresses"}
{"level":"error","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"DataStore has no available IP/Prefix addresses"}
{"level":"info","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: , DeviceNumber: -1, err: assignPodIPv4AddressUnsafe: no available IP/Prefix addresses"}
Would it make sense to count cooldown addresses against used to keep actually available warm addresses in the pool? Obviously it will lead to higher ip churn rate since addresses will be allocated and released as soon as cooldown period expires, but seems to be acceptable.
Yes as you pointed out it would add more IP churn. But it might be good to explore this enhancement. While there are extra IPs i.e, once the IPs are out of cooldown the reconciler will free those IPs. So it might lead to more EC2 calls in your account for Alloc and Free IP addresses.
Simple optimization for ec2 request count would be to implement WARM_IP_COUNT_LOW/WARM_IP_COUNT_HIGH and trigger release/allocation only once we hit any of these limits. I can work on PR to account addresses in cooldown as taken and gate this logic by env flag disabled by default, what do you think?
I feel we should limit the number of environment variables. I am thinking if we can still account cool down as taken but delay the release of IP address in reconciler. So that would make the reconciler aggressive in terms of IP allocation and relaxed in terms of freeing. What do you think about this? This would help any pod churns.
https://github.com/aws/amazon-vpc-cni-k8s/pull/1703 turned out to be quite huge and not readable, splitting the change into multiple PRs. Published https://github.com/aws/amazon-vpc-cni-k8s/pull/1704 as a first step for implementing delayed cleanup.
@jayanthvn I'm seeing "First-time contributors need a maintainer to approve running workflows." message on https://github.com/aws/amazon-vpc-cni-k8s/pull/1704, would it be possible to approve workflow?
@veshij - I have approved the workflow. Will assign the PR for review. Thank you!
@jayanthvn could you please assist with PR review? There are no updates for more than a week :-(
Sorry about the delay. I will follow up this week.
https://github.com/aws/amazon-vpc-cni-k8s/pull/1703 implements the delayed release logic. I'm not quite happy with the test coverage, though adding unit tests around ipamd might require some additional refactoring, would appreciate any suggestions.
@veshij - We are also working towards improving the test coverage. Recently this was added - https://github.com/aws/amazon-vpc-cni-k8s/tree/master/test/integration-new. This PR - https://github.com/aws/amazon-vpc-cni-k8s/pull/1759 will be adding workflow to run nightly and for each PR push.
@jayanthvn could you please assist with reviewing changes in PR?
@veshij - yes will look into it. Thank you!
Currently seeing this in high churn environments while testing 1.10.2-rc-1 where scheduling doesn't happen, had to roll back to 1.9.x. Weighing out whether random interface deletions (<1.10.2-rc-1) are better than no schedule of high churn pods (1.10.?).
@tsunamishaun - There has been no IP address strategy changes between 1.9.x and 1.10.x. Also scheduling shouldn't be impacted. If the pods aren't getting scheduled on a node then maybe max pods limit has reached or node is not ready?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
@jayanthvn could you please take one more look at the PR? Thank you!
@veshij - @achevuru had some comments, we will look into it.
@jayanthvn pls let me know if you have any questions or some changes in the PR are required.
Sorry for the delay, we will look into it early next week.
/cc @achevuru
Could you please take a look?
Hi. We are syncing up today on this PR. Will have our review by EOD. Sorry for keeping you wait.
@veshij - @achevuru and I had a discussion on this and we have some concerns on setting this value on mega clusters since IP exhaustion is a common problem and now a single node will keep the IPs for a longer time. So we are discussing internally and will get back in a day or two.
@veshij - We haven't forgotten on this. We got pulled into other issues but we are planning to meet this week for the discussion mainly around larger cluster not considering cool down IPs as available IPs since we would consume more IPs.
@veshij - Are you available on Kubernetes slack channel? @achevuru and I, wanted to discuss with you on call about the PR.
Yep, I'm available (Oleg Guba).
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Any progress on this? On large machines, we see a lot of these:
AssignIPv4Address: IP address pool stats: total: 62, assigned 52
DataStore has no available IP/Prefix addresses
We're running with WARM_IP_TARGET=8 and MINIMUM_IP_TARGET=7, as we need to cover machines from m5.xlarge to x2idn.metal, running anywhere from 6 to ~120 pods.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Issue closed due to inactivity.