amazon-vpc-cni-k8s
amazon-vpc-cni-k8s copied to clipboard
Packet loss & retransmission in veth network under load testing
What happened: There was some packets loss and retransmission happened in production environment by tcpdump capture, after narrowed down the issue happened on pod to pod communication, instead of node to node communication, I setup the test environment with iperf testing between 2 pods and noticed similar symptom, steps as following:
- Create EKS cluster with 2 nodes in single AZ, eksctl config:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: cni-performance-test
region: ap-northeast-1
version: "1.23"
iam:
withOIDC: true
managedNodeGroups:
- name: ng1
instanceType: m5.xlarge
desiredCapacity: 2
availabilityZones: ["ap-northeast-1d"]
- Create iperf3 server and client pod:
apiVersion: v1
kind: Pod
metadata:
name: iperf3-server
labels:
k8s-app: iperf3-server
spec:
containers:
- image: public.ecr.aws/whe/iperf3:latest
name: iperf3
args:
- '-s'
resources:
requests:
cpu: "1"
---
apiVersion: v1
kind: Pod
metadata:
name: iperf3-client
labels:
k8s-app: iperf3-client
spec:
containers:
- image: public.ecr.aws/whe/iperf3:latest
name: iperf3
command:
- /bin/sh
args:
- '-c'
- while true; do sleep 3600;done
resources:
requests:
cpu: "1"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: iperf3-server
topologyKey: kubernetes.io/hostname
- Get IP of iperf3-server pod:
kubectl get pod iperf3-server -o wide
- Connect to server to run iperf3 client:
$ kubectl exec -it iperf3-client -- iperf3 -c 192.168.176.101
Connecting to host 192.168.176.101, port 5201
[ 5] local 192.168.165.71 port 33234 connected to 192.168.176.101 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 546 MBytes 4.58 Gbits/sec 49 1.11 MBytes
[ 5] 1.00-2.00 sec 578 MBytes 4.84 Gbits/sec 61 1.12 MBytes
[ 5] 2.00-3.00 sec 578 MBytes 4.84 Gbits/sec 38 856 KBytes
[ 5] 3.00-4.00 sec 574 MBytes 4.81 Gbits/sec 25 1.10 MBytes
[ 5] 4.00-5.00 sec 576 MBytes 4.83 Gbits/sec 31 1.08 MBytes
[ 5] 5.00-6.00 sec 576 MBytes 4.83 Gbits/sec 30 1.10 MBytes
[ 5] 6.00-7.00 sec 576 MBytes 4.83 Gbits/sec 31 1.10 MBytes
[ 5] 7.00-8.00 sec 575 MBytes 4.82 Gbits/sec 37 848 KBytes
[ 5] 8.00-9.00 sec 576 MBytes 4.83 Gbits/sec 23 839 KBytes
[ 5] 9.00-10.00 sec 570 MBytes 4.78 Gbits/sec 30 1.10 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.59 GBytes 4.80 Gbits/sec 355 sender
[ 5] 0.00-10.03 sec 5.59 GBytes 4.79 Gbits/sec receiver
iperf Done.
You my noticed that the Retr(packet retransmission) is 355.
If I set hostNetwork: true
for both server and client pod then run the test again, retransmission is always 0:
kubectl exec -it iperf3-client -- iperf3 -c 192.168.191.108
Connecting to host 192.168.191.108, port 5201
[ 5] local 192.168.174.205 port 49464 connected to 192.168.191.108 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 555 MBytes 4.65 Gbits/sec 0 2.78 MBytes
[ 5] 1.00-2.00 sec 575 MBytes 4.82 Gbits/sec 0 3.07 MBytes
[ 5] 2.00-3.00 sec 569 MBytes 4.77 Gbits/sec 0 3.07 MBytes
[ 5] 3.00-4.00 sec 574 MBytes 4.81 Gbits/sec 0 2.15 MBytes
[ 5] 4.00-5.00 sec 576 MBytes 4.83 Gbits/sec 0 2.15 MBytes
[ 5] 5.00-6.00 sec 576 MBytes 4.83 Gbits/sec 0 2.15 MBytes
[ 5] 6.00-7.00 sec 569 MBytes 4.77 Gbits/sec 0 3.21 MBytes
[ 5] 7.00-8.00 sec 578 MBytes 4.84 Gbits/sec 0 3.21 MBytes
[ 5] 8.00-9.01 sec 574 MBytes 4.79 Gbits/sec 0 3.21 MBytes
[ 5] 9.01-10.00 sec 568 MBytes 4.79 Gbits/sec 0 3.21 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.58 GBytes 4.79 Gbits/sec 0 sender
[ 5] 0.00-10.04 sec 5.58 GBytes 4.77 Gbits/sec receiver
iperf Done.
So I suspect the packet retransmission happened in the veth pair.
Questions
- Is it the expected behavior?
- Any suggestion to improve the performance/quality of the CNI network with veth network?
Environment:
- Kubernetes version (use
kubectl version
): 1.23 - CNI Version: v1.11.3
- OS (e.g:
cat /etc/os-release
): Amazon Linux 2 - Kernel (e.g.
uname -a
): 5.4.209-116.367.amzn2.x86_64