amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET
What happened:
After setting MINIMUM_IP_TARGET and WARM_IP_TARGET. When scheduled on a new node, pods are stuck on failed to assign an IP address to container for a few minutes before being able to be created.
From my observation I can see that after setting these override, aws-node does not wait for its datastore to be populated first before reporting ready causing pods to try to be scheduled onto these nodes and goes into a crashing state as no IP can be assigned
Pod log
Normal SuccessfulAttachVolume 3m16s attachdetach-controller AttachVolume.Attach succeeded for volume "XXXXXXXX"
Warning FailedCreatePodSandBox 3m12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "22cbd7067d76c2ed509436f65384e1df59ec8949e7910d7bbbfb32de8a511f89": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 3m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "81a9fde8b5ea0da28959e8c08e08dc2ffcbe700c7f3067afd273e9ac67b37588": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 2m48s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a94f47242b45d3610a9bf25ee17de09d8fb6f77dcf1754deac13395020b1082f": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 2m37s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a6f2f67496e2c75cdc89963afc5d4f0b47fe3d6988d0356b4cd0e0ffb641400c": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 2m24s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de2c20851923cc290e187ade06472c2f77595c5d55f7ee7522603fb55e9ee943": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 2m10s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4f3e15dbff1e9f6e2ad7055a61b75724d9ccd60850e05fe811966ca1cae65529": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 115s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a545894e2066ed715d5b3667c38bf9f6ecefd45c67e79da2e8016929e84c66a0": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 100s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9e76dd6f08280e3f246e1f366b8b0170abc345e814754797cea56e523e2f14c1": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 88s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5c6a3409855d54fd5f296e0793eafc84202a4e825a2cea86784b54bfb3220231": add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 75s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0be78bc1e43e44a0f14a503c9cfbb0fdd9570d5aab0a6db9244e129c3ef294a9": add cmd: failed to assign an IP address to container
Normal Pulling 62s kubelet Pulling image "XXXXXXXXXXX"
I have been monitoring the ready state for the pod and see that it's declared ready before ipamd is able to successfully add the ENI and its IP to the datastore
The pod status change
Thu Nov 4 11:43:31 AEDT 2021
aws-node-xnpcf 0/1 Running 0 69s 10.1.56.83 ip-10-1-56-83.ap-southeast-2.compute.internal <none> <none>
Thu Nov 4 11:43:32 AEDT 2021
aws-node-xnpcf 0/1 Running 0 70s 10.1.56.83 ip-10-1-56-83.ap-southeast-2.compute.internal <none> <none>
Thu Nov 4 11:43:33 AEDT 2021
aws-node-xnpcf 1/1 Running 0 72s 10.1.56.83 ip-10-1-56-83.ap-southeast-2.compute.internal <none> <none>
Thu Nov 4 11:43:35 AEDT 2021
aws-node-xnpcf 1/1 Running 0 73s 10.1.56.83 ip-10-1-56-83.ap-southeast-2.compute.internal <none> <none>
Readiness
date; k describe po aws-node-xnpcf | ag Ready:
Thu Nov 4 11:43:43 AEDT 2021
Ready: True
Ready: True
The node status change
ip-10-1-56-83.ap-southeast-2.compute.internal NotReady <none> 38s v1.20.11
Thu Nov 4 11:43:01 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal NotReady <none> 39s v1.20.11
Thu Nov 4 11:43:02 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal Ready <none> 41s v1.20.11
Thu Nov 4 11:43:03 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal Ready <none> 42s v1.20.11
Inside ipamd around that time when the pod and node are declared ready I can see pods are trying to be scheduled and assigned an IP but there's nothing in the datastore at the time
{"level":"info","ts":"2021-11-04T00:43:35.680Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-42c1a3c7-8c69-e278-754d-efb63b6b33ea, Sandbox b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901, ifname eth0"}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901\" ContainerID:\"b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-42c1a3c7-8c69-e278-754d-efb63b6b33ea\" "}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 0, assigned 0"}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignPodIPv4Address: ENI eni-0f7e7018aaead6bc1 does not have available addresses"}
{"level":"error","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"DataStore has no available IP/Prefix addresses"}
It is not until a while later that ipamd successfully populate the datastore
{"level":"info","ts":"2021-11-04T00:45:49.425Z","caller":"ipamd/ipamd.go:917","msg":"Added ENI(eni-06129a67d2970e562)'s IP/Prefix 10.7.31.56/32 to datastore"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:904","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, totalprefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:905","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, totalprefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
During this time period between aws-node reporting as ready and when ipamd datastore is actually populated with IPs, pods are being scheduled on and failing which is undesirable.
What you expected to happen:
I expected a similar behaviour observed with the default WARM_ENI_TARGET=1 behaviour where the pod is not considered to be ready until the datastore adds the attached ENI is populated with IPs.
The following logs are from using the default option with no MINIMUM_IP_TARGET and WARM_IP_TARGET override. In this case aws-cni works as expected
Pod status
Thu Nov 4 15:55:28 AEDT 2021
aws-node-7xwp2 0/1 Running 0 69s 10.1.56.123 ip-10-1-56-123.ap-southeast-2.compute.internal <none> <none>
Thu Nov 4 15:55:29 AEDT 2021
aws-node-7xwp2 1/1 Running 0 70s 10.1.56.123 ip-10-1-56-123.ap-southeast-2.compute.internal <none> <none>
Thu Nov 4 15:55:31 AEDT 2021
aws-node-7xwp2 1/1 Running 0 72s 10.1.56.123 ip-10-1-56-123.ap-southeast-2.compute.internal <none> <none>
Ready status
Thu Nov 4 15:55:26 AEDT 2021
Ready: True
Ready: False
Thu Nov 4 15:55:27 AEDT 2021
Ready: True
Ready: False
Thu Nov 4 15:55:29 AEDT 2021
Ready: True
Ready: True
I can see in the ipamd log that the datastore successfully populate itself with the ENI's IP around this time
{"level":"info","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:917","msg":"Added ENI(eni-035e6725b4d13e122)'s IP/Prefix 10.7.30.221/32 to datastore"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:904","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, total prefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:905","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, total prefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
Environment:
- Kubernetes version (use
kubectl version):v1.20.11 - CNI Version:
v1.9.0 - OS (e.g:
cat /etc/os-release):Ubuntu 20.04.3 LTS (Focal Fossa) - Kernel (e.g.
uname -a):Linux xxxxxxxx 5.11.0-1019-aws #20~20.04.1-Ubuntu SMP Tue Sep 21 10:40:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
@dtnyn - What is your WARM_IP_TARGET and MINIMUM_IP_TARGET values? Do you have custom networking enabled?
With respect to WARM_IP_TARGET/MINIMUM_IP_TARGET and WARM_ENI_TARGET, there is no difference on how IPAMD starts up. We do wait for ipamd to start - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.9/scripts/entrypoint.sh#L139 before copying cni config file - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.9/scripts/entrypoint.sh#L145 and in ipamd init - we do try to assign IPs to the primary ENI - https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/pkg/ipamd/ipamd.go#L493. So before node ready typically we should have primary ENI with ips in non-custom networking mode.
Once you share the configs, we can try to verify this scenario locally.
These are the config enabled for the CNI
Liveness: exec [/app/grpc-health-probe -addr=:50051] delay=35s timeout=1s period=10s
Readiness: exec [/app/grpc-health-probe -addr=:50051] delay=35s timeout=1s period=10s
Environment:
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: true
AWS_VPC_K8S_CNI_EXTERNALSNAT: true
ENI_CONFIG_LABEL_DEF: awscni
AWS_VPC_K8S_CNI_VETHPREFIX: eni
AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG
AWS_VPC_K8S_PLUGIN_LOG_LEVEL: DEBUG
WARM_IP_TARGET: 5
MINIMUM_IP_TARGET: 25
MY_NODE_NAME: (v1:spec.nodeName)
WATCH_NAMESPACE: xxxxxx (v1:metadata.namespace)
This is used on both m5.xlarge and m5a.xlarge both of which the difference behaviour between overriding WARM_ENI_TARGET can be observed.
Thanks for sharing the details. Since you mentioned this is a new node, the instance will have only the primary ENI attached and during aws-node initialization IPs won't be assigned to the primary ENI since custom networking is enabled. The secondary ENI will be allocated by the reconciler based on the WARM targets requirement since available IPs will be 0. Before the reconciler thread starts node can be marked as ready and datastore will not have any IPs. But we will try both the configs locally and get back to you.
Maybe a better description of the problem would be the difference between the init time and the time when the datastore is first populated with. We're seeing an average wait of up to 3 minutes when WARM_IP_TARGET override is enabled compared to the average 30-60 seconds of the default WARM_ENI_TARGET=1 on m5.xlarge
During that 3 waits minutes pods are stuck pending for an extended period of time since there's no IP to be allocated to them. I do sometimes see pods waiting on the default configuration with WARM_ENI_TARGET but since the instantiation time is so fast it's very much negligible
Examples:
WARM_IP_TARGET=5 MINIMUM_IP_TARGET=25
Node 1:
{"level":"info","ts":"2021-11-11T05:40:03.932Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T05:40:03.934Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}
Node 2:
{"level":"info","ts":"2021-11-11T05:40:02.882Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T05:40:02.883Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}
Node 3:
{"level":"info","ts":"2021-11-11T05:40:03.974Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T05:40:03.975Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}
WARM_ENI_TARGET=1 (Default)
Node 1:
{"level":"info","ts":"2021-11-11T06:12:10.002Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T06:12:10.003Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T06:12:21.139Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:21.139Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:22.510Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-4b0995b0-dbd8-f4f9-1e56-14919ad2318a, Sandbox 82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:22.511Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7\" ContainerID:\"82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-4b0995b0-dbd8-f4f9-1e56-14919ad2318a\" "}
{"level":"debug","ts":"2021-11-11T06:12:22.511Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}
Node 2:
{"level":"info","ts":"2021-11-11T06:12:09.881Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T06:12:09.882Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T06:12:20.717Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:20.717Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:22.210Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-57a72a54-6f0e-7d8a-c423-bc826298d8fa, Sandbox 330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:22.210Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9\" ContainerID:\"330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-57a72a54-6f0e-7d8a-c423-bc826298d8fa\" "}
{"level":"debug","ts":"2021-11-11T06:12:22.210Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}
Node 3:
{"level":"info","ts":"2021-11-11T06:12:10.108Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0 ..."}
{"level":"info","ts":"2021-11-11T06:12:10.109Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"debug","ts":"2021-11-11T06:12:24.631Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:24.631Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:34.864Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-285be1d5-8ad2-7a0f-29f3-0d1c91a6efb0, Sandbox 1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:34.864Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd\" ContainerID:\"1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-285be1d5-8ad2-7a0f-29f3-0d1c91a6efb0\" "}
{"level":"debug","ts":"2021-11-11T06:12:34.864Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}
@dtnyn Have you given the new IP prefix's a try/do they have similar delay? I expect I might be experiencing similar delays as we have the same type of config in our clusters.
I did a test on my local cluster to compare the time difference between init time and when datastore has IPs and didn't find any difference with respect to the configs -
WARM_ENI_TARGET = 1
Init start ->
{"level":"info","ts":"2022-01-26T06:30:00.329Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.3 ..."}
Reconciler increasing the pool and added a secondary ENI ->
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:628","msg":"IP pool is too low: available (0) < ENI target (1) * addrsPerENI (3)"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:629","msg":"Starting to increase pool size"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:899","msg":"Skip the primary ENI for need IP check"}
{"level":"info","ts":"2022-01-26T06:30:05.052Z","caller":"eniconfig/eniconfig.go:68","msg":"Get Node Info for: ip-192-168-30-100.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"eniconfig/eniconfig.go:129","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"info","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:806","msg":"Found ENI Config Name: us-west-2a"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"ipamd/ipamd.go:780","msg":"ipamd: using custom network config: [], subnet-0eeb2ea845de1c8d3"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"No custom networking security group found, will use the node's primary ENI's SG: [%!s(*string=0xc0006cd760) %!s(*string=0xc0006cd770)]"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"Creating ENI with security groups: [sg-04123967cfc7fea84 sg-0c72b30626f1cbaf3] in subnet: subnet-0eeb2ea845de1c8d3"}
{"level":"info","ts":"2022-01-26T06:30:05.608Z","caller":"awsutils/awsutils.go:689","msg":"Created a new ENI: eni-07c415ca2e05f9b9d"}
{"level":"debug","ts":"2022-01-26T06:30:05.709Z","caller":"awsutils/awsutils.go:732","msg":"Discovered device number is used: 0"}
{"level":"debug","ts":"2022-01-26T06:30:05.709Z","caller":"awsutils/awsutils.go:732","msg":"Found a free device number: 1"}
{"level":"info","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:821","msg":"Successfully created and attached a new ENI eni-07c415ca2e05f9b9d to instance"}
{"level":"info","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:830","msg":"Trying to allocate 3 IP addresses on ENI eni-07c415ca2e05f9b9d"}
{"level":"debug","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:830","msg":"PD enabled - false"}
{"level":"info","ts":"2022-01-26T06:30:07.535Z","caller":"ipamd/ipamd.go:830","msg":"Allocated 3 private IP addresses"}
WARM_IP_TARGET = 1 and MINIMUM_IP_TARGET = 3
Init start ->
{"level":"info","ts":"2022-01-26T06:57:17.253Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.3 ..."}
Reconciler increasing the pool and added a secondary ENI ->
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:629","msg":"Starting to increase pool size"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:737","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:862","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:886","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:899","msg":"Skip the primary ENI for need IP check"}
{"level":"info","ts":"2022-01-26T06:57:21.208Z","caller":"eniconfig/eniconfig.go:68","msg":"Get Node Info for: ip-192-168-33-140.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-01-26T06:57:21.209Z","caller":"eniconfig/eniconfig.go:129","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"info","ts":"2022-01-26T06:57:21.209Z","caller":"ipamd/ipamd.go:806","msg":"Found ENI Config Name: us-west-2b"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"ipamd/ipamd.go:780","msg":"ipamd: using custom network config: [], subnet-0747217e1d86e6510"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"No custom networking security group found, will use the node's primary ENI's SG: [%!s(*string=0xc00024c5e0) %!s(*string=0xc00024c5f0)]"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"Creating ENI with security groups: [sg-04123967cfc7fea84 sg-0c72b30626f1cbaf3] in subnet: subnet-0747217e1d86e6510"}
{"level":"info","ts":"2022-01-26T06:57:21.723Z","caller":"awsutils/awsutils.go:689","msg":"Created a new ENI: eni-052e27ef0393de5ca"}
{"level":"debug","ts":"2022-01-26T06:57:21.824Z","caller":"awsutils/awsutils.go:732","msg":"Discovered device number is used: 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.824Z","caller":"awsutils/awsutils.go:732","msg":"Found a free device number: 1"}
{"level":"info","ts":"2022-01-26T06:57:23.135Z","caller":"ipamd/ipamd.go:821","msg":"Successfully created and attached a new ENI eni-052e27ef0393de5ca to instance"}
{"level":"debug","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:1899","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"info","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:830","msg":"Trying to allocate 3 IP addresses on ENI eni-052e27ef0393de5ca"}
{"level":"debug","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:830","msg":"PD enabled - false"}
{"level":"info","ts":"2022-01-26T06:57:23.554Z","caller":"ipamd/ipamd.go:830","msg":"Allocated 3 private IP addresses"}
Both the cases within 4-5 seconds reconciler tried to increase the pool. There is no difference in the code path for both the cases. Once the reconciler starts based on the WARM targets pool will be increased.
But I feel there is one optimization that can be done here, since you are using custom networking and primary ENI cannot have IPs/Prefixes attached we should go ahead and attach secondary ENI based on the WARM targets and that should help with the pod launch time otherwise we will have to wait for the reconciler to increase the pool and node will be marked as ready.
@sidewinder12s We have tried prefix mode but did not do extended testing due to the minimum allocation size of /28 which doesn't fit our use-case.
@jayanthvn I believe that your proposed optimization, attaching a new ENI with pre-allocated IP, will best for our current problem. Since we can observe that the ENI attachment and the time starting the reconcile are very fast. Only the process of the initial reconiller adding new IP to the ENI that we see is very slow. Any subsequent reconciller processes on the the same ENI is fast.
@dtnyn - Yes that's right. I have the PR and testing it out locally.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
@jayanthvn we're experiencing this issue as well. How has your local testing been going?
I am waiting for the code review.
+1
Any updates on PR Progress?
We had discussed few changes for the PR but couldn't progress because of other commitments. I will take it up next week for changes/review and track it for 1.11.5 or 1.12.0 release. Will provide the release timelines once the PR is merged.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days
Not stale. We still regularly see this issue.