amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Count addresses in cooldown against used ones in datastoreTargetState()

Open veshij opened this issue 4 years ago • 27 comments

datastoreTargetState() calculates short value of ip addresses not taking into account addresses in cooldown. On nodes with a high churn rate it leads to failures during address allocation (note 2 addresses are both "available" and in "cooldown").

{"level":"debug","ts":"2021-10-20T18:32:16.219Z","caller":"ipamd/ipamd.go:1214","msg":"Current warm IP stats: target: 2, total: 18, assigned: 16, available: 2, cooldown: 2, short: 0, over 0"}
{"level":"info","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"Received AddNetwork for NS /var/run/netns/cni-8497daaf-cab2-7929-ff1c-53f97a295633, Sandbox 2324678184bfbb2d12122195a190c25744cb9cdc195ae0363a63c5f3c98d21f9, ifname eth0"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"1db8f1952a05cac8-wpnnj\"  K8S_POD_NAMESPACE:\"blackbox\"  K8S_POD_INFRA_CONTAINER_ID:\"2324678184bfbb2d12122195a190c25744cb9cdc195ae0363a63c5f3c98d21f9\"  ContainerID:\"2324678184bfbb2d121221
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"AssignIPv4Address: IP address pool stats: total: 18, assigned 16"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:757","msg":"Get free IP from prefix failed no free IP available in the prefix - 10.244.56.173/ffffffff"}
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"Unable to get IP address from CIDR: no free IP available in the prefix - 10.244.56.173/ffffffff"}
...
{"level":"debug","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"AssignPodIPv4Address: ENI eni-0f2c931402cffbd00 does not have available addresses"}
{"level":"error","ts":"2021-10-20T18:32:18.997Z","caller":"datastore/data_store.go:680","msg":"DataStore has no available IP/Prefix addresses"}
{"level":"info","ts":"2021-10-20T18:32:18.997Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr , IPv6Addr: , DeviceNumber: -1, err: assignPodIPv4AddressUnsafe: no available IP/Prefix addresses"}

Would it make sense to count cooldown addresses against used to keep actually available warm addresses in the pool? Obviously it will lead to higher ip churn rate since addresses will be allocated and released as soon as cooldown period expires, but seems to be acceptable.

veshij avatar Oct 20 '21 18:10 veshij

Yes as you pointed out it would add more IP churn. But it might be good to explore this enhancement. While there are extra IPs i.e, once the IPs are out of cooldown the reconciler will free those IPs. So it might lead to more EC2 calls in your account for Alloc and Free IP addresses.

jayanthvn avatar Oct 20 '21 19:10 jayanthvn

Simple optimization for ec2 request count would be to implement WARM_IP_COUNT_LOW/WARM_IP_COUNT_HIGH and trigger release/allocation only once we hit any of these limits. I can work on PR to account addresses in cooldown as taken and gate this logic by env flag disabled by default, what do you think?

veshij avatar Oct 20 '21 20:10 veshij

I feel we should limit the number of environment variables. I am thinking if we can still account cool down as taken but delay the release of IP address in reconciler. So that would make the reconciler aggressive in terms of IP allocation and relaxed in terms of freeing. What do you think about this? This would help any pod churns.

jayanthvn avatar Oct 20 '21 21:10 jayanthvn

https://github.com/aws/amazon-vpc-cni-k8s/pull/1703 turned out to be quite huge and not readable, splitting the change into multiple PRs. Published https://github.com/aws/amazon-vpc-cni-k8s/pull/1704 as a first step for implementing delayed cleanup.

veshij avatar Oct 21 '21 07:10 veshij

@jayanthvn I'm seeing "First-time contributors need a maintainer to approve running workflows." message on https://github.com/aws/amazon-vpc-cni-k8s/pull/1704, would it be possible to approve workflow?

veshij avatar Oct 25 '21 18:10 veshij

@veshij - I have approved the workflow. Will assign the PR for review. Thank you!

jayanthvn avatar Oct 25 '21 18:10 jayanthvn

@jayanthvn could you please assist with PR review? There are no updates for more than a week :-(

veshij avatar Nov 13 '21 20:11 veshij

Sorry about the delay. I will follow up this week.

jayanthvn avatar Nov 13 '21 20:11 jayanthvn

https://github.com/aws/amazon-vpc-cni-k8s/pull/1703 implements the delayed release logic. I'm not quite happy with the test coverage, though adding unit tests around ipamd might require some additional refactoring, would appreciate any suggestions.

veshij avatar Nov 30 '21 00:11 veshij

@veshij - We are also working towards improving the test coverage. Recently this was added - https://github.com/aws/amazon-vpc-cni-k8s/tree/master/test/integration-new. This PR - https://github.com/aws/amazon-vpc-cni-k8s/pull/1759 will be adding workflow to run nightly and for each PR push.

jayanthvn avatar Nov 30 '21 04:11 jayanthvn

@jayanthvn could you please assist with reviewing changes in PR?

veshij avatar Dec 06 '21 18:12 veshij

@veshij - yes will look into it. Thank you!

jayanthvn avatar Dec 06 '21 18:12 jayanthvn

Currently seeing this in high churn environments while testing 1.10.2-rc-1 where scheduling doesn't happen, had to roll back to 1.9.x. Weighing out whether random interface deletions (<1.10.2-rc-1) are better than no schedule of high churn pods (1.10.?).

tsunamishaun avatar Dec 23 '21 21:12 tsunamishaun

@tsunamishaun - There has been no IP address strategy changes between 1.9.x and 1.10.x. Also scheduling shouldn't be impacted. If the pods aren't getting scheduled on a node then maybe max pods limit has reached or node is not ready?

jayanthvn avatar Jan 12 '22 02:01 jayanthvn

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Apr 21 '22 00:04 github-actions[bot]

@jayanthvn could you please take one more look at the PR? Thank you!

veshij avatar Apr 22 '22 17:04 veshij

@veshij - @achevuru had some comments, we will look into it.

jayanthvn avatar Apr 22 '22 17:04 jayanthvn

@jayanthvn pls let me know if you have any questions or some changes in the PR are required.

veshij avatar Apr 27 '22 22:04 veshij

Sorry for the delay, we will look into it early next week.

/cc @achevuru

jayanthvn avatar Apr 27 '22 23:04 jayanthvn

Could you please take a look?

veshij avatar May 06 '22 05:05 veshij

Hi. We are syncing up today on this PR. Will have our review by EOD. Sorry for keeping you wait.

jayanthvn avatar May 06 '22 16:05 jayanthvn

@veshij - @achevuru and I had a discussion on this and we have some concerns on setting this value on mega clusters since IP exhaustion is a common problem and now a single node will keep the IPs for a longer time. So we are discussing internally and will get back in a day or two.

jayanthvn avatar May 09 '22 23:05 jayanthvn

@veshij - We haven't forgotten on this. We got pulled into other issues but we are planning to meet this week for the discussion mainly around larger cluster not considering cool down IPs as available IPs since we would consume more IPs.

jayanthvn avatar May 17 '22 17:05 jayanthvn

@veshij - Are you available on Kubernetes slack channel? @achevuru and I, wanted to discuss with you on call about the PR.

jayanthvn avatar May 18 '22 00:05 jayanthvn

Yep, I'm available (Oleg Guba).

veshij avatar May 18 '22 02:05 veshij

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Jul 18 '22 00:07 github-actions[bot]

Any progress on this? On large machines, we see a lot of these:

AssignIPv4Address: IP address pool stats: total: 62, assigned 52
DataStore has no available IP/Prefix addresses

We're running with WARM_IP_TARGET=8 and MINIMUM_IP_TARGET=7, as we need to cover machines from m5.xlarge to x2idn.metal, running anywhere from 6 to ~120 pods.

therc avatar Jul 20 '22 13:07 therc

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Sep 22 '22 00:09 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Oct 06 '22 00:10 github-actions[bot]