amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Packet loss & retransmission in veth network under load testing

Open walkley opened this issue 2 years ago • 0 comments

What happened: There was some packets loss and retransmission happened in production environment by tcpdump capture, after narrowed down the issue happened on pod to pod communication, instead of node to node communication, I setup the test environment with iperf testing between 2 pods and noticed similar symptom, steps as following:

  1. Create EKS cluster with 2 nodes in single AZ, eksctl config:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: cni-performance-test
  region: ap-northeast-1
  version: "1.23"
iam:
  withOIDC: true
managedNodeGroups:
  - name: ng1
    instanceType: m5.xlarge
    desiredCapacity: 2
    availabilityZones: ["ap-northeast-1d"]
  1. Create iperf3 server and client pod:
apiVersion: v1
kind: Pod
metadata:
  name: iperf3-server
  labels:
    k8s-app: iperf3-server
spec:
  containers:
  - image: public.ecr.aws/whe/iperf3:latest
    name: iperf3
    args:
    - '-s'
    resources:
      requests:
        cpu: "1"
---
apiVersion: v1
kind: Pod
metadata:
  name: iperf3-client
  labels:
    k8s-app: iperf3-client
spec:
  containers:
  - image: public.ecr.aws/whe/iperf3:latest
    name: iperf3
    command:
      - /bin/sh
    args:
      - '-c'
      - while true; do sleep 3600;done
    resources:
      requests:
        cpu: "1"
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              k8s-app: iperf3-server
          topologyKey: kubernetes.io/hostname
  1. Get IP of iperf3-server pod:
kubectl get pod iperf3-server -o wide
  1. Connect to server to run iperf3 client:
$ kubectl exec -it iperf3-client -- iperf3 -c 192.168.176.101
Connecting to host 192.168.176.101, port 5201
[  5] local 192.168.165.71 port 33234 connected to 192.168.176.101 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   546 MBytes  4.58 Gbits/sec   49   1.11 MBytes       
[  5]   1.00-2.00   sec   578 MBytes  4.84 Gbits/sec   61   1.12 MBytes       
[  5]   2.00-3.00   sec   578 MBytes  4.84 Gbits/sec   38    856 KBytes       
[  5]   3.00-4.00   sec   574 MBytes  4.81 Gbits/sec   25   1.10 MBytes       
[  5]   4.00-5.00   sec   576 MBytes  4.83 Gbits/sec   31   1.08 MBytes       
[  5]   5.00-6.00   sec   576 MBytes  4.83 Gbits/sec   30   1.10 MBytes       
[  5]   6.00-7.00   sec   576 MBytes  4.83 Gbits/sec   31   1.10 MBytes       
[  5]   7.00-8.00   sec   575 MBytes  4.82 Gbits/sec   37    848 KBytes       
[  5]   8.00-9.00   sec   576 MBytes  4.83 Gbits/sec   23    839 KBytes       
[  5]   9.00-10.00  sec   570 MBytes  4.78 Gbits/sec   30   1.10 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec  355             sender
[  5]   0.00-10.03  sec  5.59 GBytes  4.79 Gbits/sec                  receiver

iperf Done.

You my noticed that the Retr(packet retransmission) is 355. If I set hostNetwork: true for both server and client pod then run the test again, retransmission is always 0:

kubectl exec -it iperf3-client -- iperf3 -c 192.168.191.108
Connecting to host 192.168.191.108, port 5201
[  5] local 192.168.174.205 port 49464 connected to 192.168.191.108 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   555 MBytes  4.65 Gbits/sec    0   2.78 MBytes       
[  5]   1.00-2.00   sec   575 MBytes  4.82 Gbits/sec    0   3.07 MBytes       
[  5]   2.00-3.00   sec   569 MBytes  4.77 Gbits/sec    0   3.07 MBytes       
[  5]   3.00-4.00   sec   574 MBytes  4.81 Gbits/sec    0   2.15 MBytes       
[  5]   4.00-5.00   sec   576 MBytes  4.83 Gbits/sec    0   2.15 MBytes       
[  5]   5.00-6.00   sec   576 MBytes  4.83 Gbits/sec    0   2.15 MBytes       
[  5]   6.00-7.00   sec   569 MBytes  4.77 Gbits/sec    0   3.21 MBytes       
[  5]   7.00-8.00   sec   578 MBytes  4.84 Gbits/sec    0   3.21 MBytes       
[  5]   8.00-9.01   sec   574 MBytes  4.79 Gbits/sec    0   3.21 MBytes       
[  5]   9.01-10.00  sec   568 MBytes  4.79 Gbits/sec    0   3.21 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.58 GBytes  4.79 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  5.58 GBytes  4.77 Gbits/sec                  receiver

iperf Done.

So I suspect the packet retransmission happened in the veth pair.

Questions

  1. Is it the expected behavior?
  2. Any suggestion to improve the performance/quality of the CNI network with veth network?

Environment:

  • Kubernetes version (use kubectl version): 1.23
  • CNI Version: v1.11.3
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.4.209-116.367.amzn2.x86_64

walkley avatar Sep 29 '22 11:09 walkley