tempesta icon indicating copy to clipboard operation
tempesta copied to clipboard

Iperf3 results on vanilla linux vs TFW-patched linux

Open s0nx opened this issue 2 years ago • 9 comments

Motivation

Supposedly, there is a performance degradation in regular networking stress tests, e.g. Iperf3 on 5.10.35 Tempesta linux vs. vanilla 5.10.35:

  • vanilla 5.10.35 - client report:
% iperf3 -c 192.168.122.249 -P 4
Connecting to host 192.168.122.249, port 5201
[  5 ] local 192.168.122.1 port 35804 connected to 192.168.122.249 port 5201
[  7 ] local 192.168.122.1 port 35810 connected to 192.168.122.249 port 5201
[  9 ] local 192.168.122.1 port 35816 connected to 192.168.122.249 port 5201
[ 11 ] local 192.168.122.1 port 35832 connected to 192.168.122.249 port 5201
[ ID ] Interval           Transfer     Bitrate         Retr  Cwnd
[  5 ]   0.00-1.00   sec  2.38 GBytes  20.5 Gbits/sec  11456    612 KBytes       
[  7 ]   0.00-1.00   sec  2.17 GBytes  18.6 Gbits/sec  9619    799 KBytes       
[  9 ]   0.00-1.00   sec  1.94 GBytes  16.6 Gbits/sec  6780   1.91 MBytes       
[ 11 ]   0.00-1.00   sec  2.11 GBytes  18.1 Gbits/sec  6861    488 KBytes       
[SUM]   0.00-1.00   sec  8.59 GBytes  73.8 Gbits/sec  34716             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   1.00-2.00   sec  2.35 GBytes  20.2 Gbits/sec    0    612 KBytes       
[  7 ]   1.00-2.00   sec  2.35 GBytes  20.2 Gbits/sec    0    799 KBytes       
[  9 ]   1.00-2.00   sec  2.35 GBytes  20.2 Gbits/sec   46   1.34 MBytes       
[ 11 ]   1.00-2.00   sec  2.35 GBytes  20.2 Gbits/sec  185    584 KBytes       
[SUM]   1.00-2.00   sec  9.38 GBytes  80.6 Gbits/sec  231             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   2.00-3.00   sec  2.33 GBytes  20.0 Gbits/sec  12408    734 KBytes       
[  7 ]   2.00-3.00   sec  1.92 GBytes  16.5 Gbits/sec  6470    641 KBytes       
[  9 ]   2.00-3.00   sec  1.79 GBytes  15.4 Gbits/sec  6030    146 KBytes       
[ 11 ]   2.00-3.00   sec  2.27 GBytes  19.5 Gbits/sec  8615    626 KBytes       
[SUM]   2.00-3.00   sec  8.31 GBytes  71.4 Gbits/sec  33523             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   3.00-4.00   sec  2.29 GBytes  19.7 Gbits/sec  1031    553 KBytes       
[  7 ]   3.00-4.00   sec  2.29 GBytes  19.7 Gbits/sec  265    656 KBytes       
[  9 ]   3.00-4.00   sec  2.25 GBytes  19.3 Gbits/sec  1338    679 KBytes       
[ 11 ]   3.00-4.00   sec  2.29 GBytes  19.7 Gbits/sec  860    646 KBytes       
[SUM]   3.00-4.00   sec  9.12 GBytes  78.4 Gbits/sec  3494             
. . .  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   8.00-9.00   sec  2.22 GBytes  19.0 Gbits/sec    0    803 KBytes       
[  7 ]   8.00-9.00   sec  2.22 GBytes  19.0 Gbits/sec    0    590 KBytes       
[  9 ]   8.00-9.00   sec  2.22 GBytes  19.0 Gbits/sec    2    634 KBytes       
[ 11 ]   8.00-9.00   sec  2.22 GBytes  19.0 Gbits/sec  449    679 KBytes       
[SUM]   8.00-9.00   sec  8.87 GBytes  76.2 Gbits/sec  451             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   9.00-10.00  sec  2.14 GBytes  18.4 Gbits/sec  285    817 KBytes       
[  7 ]   9.00-10.00  sec  2.14 GBytes  18.4 Gbits/sec   59    629 KBytes       
[  9 ]   9.00-10.00  sec  2.14 GBytes  18.4 Gbits/sec  284   1.07 MBytes       
[ 11 ]   9.00-10.00  sec  2.14 GBytes  18.4 Gbits/sec  209    677 KBytes       
[SUM]   9.00-10.00  sec  8.55 GBytes  73.4 Gbits/sec  837             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID ] Interval           Transfer     Bitrate         Retr
[  5 ]   0.00-10.00  sec  22.7 GBytes  19.5 Gbits/sec  44193             sender
[  5 ]   0.00-10.00  sec  22.7 GBytes  19.5 Gbits/sec                  receiver
[  7 ]   0.00-10.00  sec  22.1 GBytes  19.0 Gbits/sec  38418             sender
[  7 ]   0.00-10.00  sec  22.1 GBytes  19.0 Gbits/sec                  receiver
[  9 ]   0.00-10.00  sec  20.9 GBytes  17.9 Gbits/sec  25489             sender
[  9 ]   0.00-10.00  sec  20.9 GBytes  17.9 Gbits/sec                  receiver
[ 11 ]   0.00-10.00  sec  21.6 GBytes  18.5 Gbits/sec  28928             sender
[ 11 ]   0.00-10.00  sec  21.6 GBytes  18.5 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  87.2 GBytes  74.9 Gbits/sec  137028             sender
[SUM]   0.00-10.00  sec  87.2 GBytes  74.9 Gbits/sec                  receiver

iperf Done.
  • Patched TFW linux 5.10.35 - client report:
brandx dev/src/tempesta (pv-1703-crash*) % iperf3 -c 192.168.122.249 -P 4
Connecting to host 192.168.122.249, port 5201
[  5 ] local 192.168.122.1 port 50438 connected to 192.168.122.249 port 5201
[  7 ] local 192.168.122.1 port 50446 connected to 192.168.122.249 port 5201
[  9 ] local 192.168.122.1 port 50454 connected to 192.168.122.249 port 5201
[ 11 ] local 192.168.122.1 port 50468 connected to 192.168.122.249 port 5201
[ ID ] Interval           Transfer     Bitrate         Retr  Cwnd
[  5 ]   0.00-1.00   sec   452 MBytes  3.79 Gbits/sec  3740    454 KBytes       
[  7 ]   0.00-1.00   sec   502 MBytes  4.21 Gbits/sec  5035    331 KBytes       
[  9 ]   0.00-1.00   sec   512 MBytes  4.29 Gbits/sec  7081    272 KBytes       
[ 11 ]   0.00-1.00   sec   501 MBytes  4.20 Gbits/sec  4235    506 KBytes       
[SUM]   0.00-1.00   sec  1.92 GBytes  16.5 Gbits/sec  20091             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   1.00-2.00   sec   601 MBytes  5.04 Gbits/sec  2303    604 KBytes       
[  7 ]   1.00-2.00   sec   546 MBytes  4.58 Gbits/sec  3237    348 KBytes       
[  9 ]   1.00-2.00   sec   382 MBytes  3.21 Gbits/sec  3559    177 KBytes       
[ 11 ]   1.00-2.00   sec   523 MBytes  4.39 Gbits/sec  4393    147 KBytes       
[SUM]   1.00-2.00   sec  2.00 GBytes  17.2 Gbits/sec  13492             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   2.00-3.00   sec   522 MBytes  4.38 Gbits/sec  3512    576 KBytes       
[  7 ]   2.00-3.00   sec   496 MBytes  4.16 Gbits/sec  6414    475 KBytes       
[  9 ]   2.00-3.00   sec   634 MBytes  5.32 Gbits/sec  6991    387 KBytes       
[ 11 ]   2.00-3.00   sec   471 MBytes  3.95 Gbits/sec  4595    158 KBytes       
[SUM]   2.00-3.00   sec  2.07 GBytes  17.8 Gbits/sec  21512             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   3.00-4.00   sec   438 MBytes  3.67 Gbits/sec  2083    291 KBytes       
[  7 ]   3.00-4.00   sec   506 MBytes  4.25 Gbits/sec  3299    115 KBytes       
[  9 ]   3.00-4.00   sec   494 MBytes  4.14 Gbits/sec  4640    291 KBytes       
[ 11 ]   3.00-4.00   sec   575 MBytes  4.82 Gbits/sec  3983    288 KBytes       
[SUM]   3.00-4.00   sec  1.97 GBytes  16.9 Gbits/sec  14005             
- - - - - - - - - - - - - - - - - - - - - - - - -
. . .   
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   8.00-9.00   sec   435 MBytes  3.65 Gbits/sec  2286    307 KBytes       
[  7 ]   8.00-9.00   sec   532 MBytes  4.47 Gbits/sec  5479    318 KBytes       
[  9 ]   8.00-9.00   sec   528 MBytes  4.42 Gbits/sec  6085    321 KBytes       
[ 11 ]   8.00-9.00   sec   511 MBytes  4.29 Gbits/sec  2556    291 KBytes       
[SUM]   8.00-9.00   sec  1.96 GBytes  16.8 Gbits/sec  16406             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5 ]   9.00-10.00  sec   482 MBytes  4.05 Gbits/sec  7488    393 KBytes       
[  7 ]   9.00-10.00  sec   556 MBytes  4.67 Gbits/sec  4663    351 KBytes       
[  9 ]   9.00-10.00  sec   459 MBytes  3.85 Gbits/sec  3264    235 KBytes       
[ 11 ]   9.00-10.00  sec   464 MBytes  3.89 Gbits/sec  4320    395 KBytes       
[SUM]   9.00-10.00  sec  1.92 GBytes  16.5 Gbits/sec  19735             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID ] Interval           Transfer     Bitrate         Retr
[  5 ]   0.00-10.00  sec  5.06 GBytes  4.35 Gbits/sec  35307             sender
[  5 ]   0.00-10.00  sec  5.06 GBytes  4.34 Gbits/sec                  receiver
[  7 ]   0.00-10.00  sec  4.95 GBytes  4.25 Gbits/sec  46382             sender
[  7 ]   0.00-10.00  sec  4.95 GBytes  4.25 Gbits/sec                  receiver
[  9 ]   0.00-10.00  sec  4.90 GBytes  4.21 Gbits/sec  47087             sender
[  9 ]   0.00-10.00  sec  4.90 GBytes  4.21 Gbits/sec                  receiver
[ 11 ]   0.00-10.00  sec  4.92 GBytes  4.23 Gbits/sec  40558             sender
[ 11 ]   0.00-10.00  sec  4.92 GBytes  4.22 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  19.8 GBytes  17.0 Gbits/sec  169334             sender
[SUM]   0.00-10.00  sec  19.8 GBytes  17.0 Gbits/sec                  receiver

iperf Done.

Scope

iperf3 in server mode was being run on the VM, which the client was on host. Tempesta modules were not loaded during the test. Libvirt XML of the VM:

<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
  <name>f35tfw</name>
  <uuid>5a92e324-248d-48ca-ae5a-7e1453d5c48e</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://fedoraproject.org/fedora/33"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">6291456</memory>
  <currentMemory unit="KiB">6291456</currentMemory>
  <memoryBacking>
    <source type="memfd"/>
    <access mode="shared"/>
  </memoryBacking>
  <vcpu placement="static">4</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-5.1">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/f35tfw_VARS.fd</nvram>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on"/>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/f33-clone-1.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x18"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x19"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="scsi" index="0" model="virtio-scsi">
      <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:92:69:2f"/>
      <source network="default"/>
      <model type="virtio"/>
      <driver name="vhost" queues="4" rx_queue_size="256" tx_queue_size="256"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <interface type="network">
      <mac address="52:54:00:3a:22:ee"/>
      <source network="iso"/>
      <model type="virtio"/>
      <driver name="vhost" queues="4" rx_queue_size="256" tx_queue_size="256"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
    </interface>
    <interface type="network">
      <mac address="52:54:00:87:6a:38"/>
      <source network="network"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
    </interface>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <channel type="unix">
      <target type="virtio" name="org.qemu.guest_agent.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="tablet" bus="usb">
      <address type="usb" bus="0" port="1"/>
    </input>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <audio id="1" type="none"/>
    <watchdog model="itco" action="reset"/>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </memballoon>
    <rng model="virtio">
      <backend model="random">/dev/urandom</backend>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </rng>
  </devices>
  <qemu:commandline>
    <qemu:arg value="-gdb"/>
    <qemu:arg value="tcp::1237"/>
  </qemu:commandline>
</domain>

The VM was allocated with 4 vCPUs, using virtio-net adapters. Each adapter has 4 RX/TX queues:

On host:
sh-5.1$ sudo ip tuntap
vnet82: tap multi_queue vnet_hdr
vnet83: tap multi_queue vnet_hdr
vnet84: tap vnet_hdr

On VM:
f35tfw :: net_perf_tfw/vanilla » l /sys/class/net/enp1s0/queues 
total 0
drwxr-xr-x. 10 root root 0 Apr 10 09:10 .
drwxr-xr-x.  5 root root 0 Apr 10 09:10 ..
drwxr-xr-x.  3 root root 0 Apr 10 09:10 rx-0
drwxr-xr-x.  3 root root 0 Apr 10 09:10 rx-1
drwxr-xr-x.  3 root root 0 Apr 10 09:10 rx-2
drwxr-xr-x.  3 root root 0 Apr 10 09:10 rx-3
drwxr-xr-x.  3 root root 0 Apr 10 09:10 tx-0
drwxr-xr-x.  3 root root 0 Apr 10 09:10 tx-1
drwxr-xr-x.  3 root root 0 Apr 10 09:10 tx-2
drwxr-xr-x.  3 root root 0 Apr 10 09:10 tx-3

Some perf reports:

  1. vanilla 5.10.35
-   54.89%  swapper          [kernel.kallsyms]           [k] native_safe_halt                                                                                 ▒
     native_safe_halt                                                   ◆
     default_idle                                                       ▒
     default_idle_call                                                  ▒
     do_idle                                                            ▒
   - cpu_startup_entry                                                  ▒
        43.72% secondary_startup_64_no_verify                           ▒
      - 11.17% start_kernel                                             ▒
           secondary_startup_64_no_verify                               ▒
-   20.66%  iperf3           [kernel.kallsyms]           [k] copy_user_enhanced_fast_string                                                                   ▒
   - copy_user_enhanced_fast_string                                     ▒
      - 20.62% copyout                                                  ▒
           _copy_to_iter                                                ▒
         - __skb_datagram_iter                                          ▒
            - 20.61% skb_copy_datagram_iter                             ▒
                 tcp_recvmsg                                            ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
               - read                                                   ▒
                    1.87% 0xe623ba286750fff5                            ▒
                    1.67% 0xe8f7684f9b9fcfc8                            ▒
                    1.62% 0x7f827d21ee4cda68                            ▒
                    1.50% 0xea839f122facb847                            ▒
                    1.46% 0x1599e901ecfa2c98                            ▒
                    1.41% 0x6654403d5aeb6572                            ▒
                    1.29% 0xb8b4e06a6e4be286                            ▒
                    1.10% 0xb698c3ef344f82b4                            ▒
                    0.74% 0xc68a6bdedc16ddc7                            ▒
                    0.73% 0x24e92d686f459547                            ▒
                    0.71% 0x8994ab98dc5c9e30                            ▒
                    0.67% 0x75068f8e6272fe73                            ▒
-    1.60%  iperf3           [kernel.kallsyms]           [k] __free_pages_ok                                                                                  ▒
   - __free_pages_ok                                                    ▒
      - 1.60% skb_release_data                                          ▒
         - 1.59% __kfree_skb                                            ▒
            - 1.59% tcp_recvmsg                                         ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
                 read                                                   ▒
+    1.08%  iperf3           [kernel.kallsyms]           [k] iowrite16  ▒
+    0.89%  iperf3           [kernel.kallsyms]           [k] _raw_spin_unlock_irqrestore                                                                      ▒
+    0.86%  iperf3           [kernel.kallsyms]           [k] __check_object_size                                                                              ▒
+    0.80%  iperf3           [kernel.kallsyms]           [k] __softirqentry_text_start                                                                        ▒
+    0.76%  swapper          [kernel.kallsyms]           [k] __softirqentry_text_start                                                                        ▒
+    0.66%  iperf3           [kernel.kallsyms]           [k] pvclock_clocksource_read                                                                         ▒
+    0.54%  iperf3           [kernel.kallsyms]           [k] copy_user_generic_unrolled                                                                       ▒
     0.47%  swapper          [kernel.kallsyms]           [k] receive_buf▒
     0.46%  swapper          [kernel.kallsyms]           [k] finish_task_switch                                                                               ▒
     0.41%  iperf3           [kernel.kallsyms]           [k] __pv_queued_spin_lock_slowpath                                                                   ▒
     0.38%  iperf3           [kernel.kallsyms]           [k] skb_release_data                                                                                 ▒
     0.38%  iperf3           [kernel.kallsyms]           [k] __slab_free▒
     0.35%  iperf3           [kernel.kallsyms]           [k] __skb_datagram_iter                                                                              ▒
     0.32%  iperf3           [kernel.kallsyms]           [k] tcp_recvmsg▒
     0.28%  iperf3           [kernel.kallsyms]           [k] sock_poll  ▒
     0.28%  iperf3           [kernel.kallsyms]           [k] tcp_poll   ▒
     0.27%  iperf3           [kernel.kallsyms]           [k] finish_task_switch                                                                               ▒
     0.26%  swapper          [kernel.kallsyms]           [k] _raw_spin_unlock_irqrestore                                                                      ▒
     0.26%  iperf3           [kernel.kallsyms]           [k] _copy_to_iter                                                                                    ▒
     0.26%  iperf3           [kernel.kallsyms]           [k] syscall_enter_from_user_mode                                                                     ▒
     0.25%  swapper          [kernel.kallsyms]           [k] try_fill_recv                                                                                    ▒
     0.25%  iperf3           [kernel.kallsyms]           [k] sock_rfree ▒
     0.24%  iperf3           [kernel.kallsyms]           [k] __raw_callee_save___pv_queued_spin_unlock                                                        ▒
     0.23%  swapper          [kernel.kallsyms]           [k] get_page_from_freelist                                                                           ▒
     0.22%  iperf3           libc.so.6                   [.] read       ▒
     0.22%  iperf3           [kernel.kallsyms]           [k] __virt_addr_valid                                                                                ▒
     0.22%  iperf3           [kernel.kallsyms]           [k] kfree      ▒
     0.20%  iperf3           [kernel.kallsyms]           [k] __tcp_transmit_skb                                                                               ▒
     0.20%  iperf3           [kernel.kallsyms]           [k] __fget_light                                                                                     ▒
     0.19%  swapper          [kernel.kallsyms]           [k] iowrite16  ▒
     0.18%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock                                                                                   ▒
     0.17%  iperf3           [kernel.kallsyms]           [k] do_select  ▒
     0.16%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock_bh                                                                                ▒
     0.15%  iperf3           [kernel.kallsyms]           [k] virtqueue_kick_prepare                                                                           ▒
     0.14%  iperf3           [vdso]                      [.] __vdso_clock_gettime                                                                             ▒
     0.13%  iperf3           [kernel.kallsyms]           [k] ip_send_check                                                                                    ▒
     0.13%  swapper          [kernel.kallsyms]           [k] virtqueue_add_split                                                                              ▒
     0.12%  iperf3           [kernel.kallsyms]           [k] kmem_cache_free                                                                                  ▒
     0.12%  iperf3           [kernel.kallsyms]           [k] slab_free_freelist_hook                                                                          ▒
     0.12%  iperf3           [kernel.kallsyms]           [k] virtqueue_get_buf_ctx_split                                                                      ▒
     0.11%  iperf3           [kernel.kallsyms]           [k] copyout    ▒
     0.11%  iperf3           [kernel.kallsyms]           [k] selinux_socket_recvmsg                                                                           ▒
     0.11%  iperf3           libiperf.so.0.0.0           [.] Nread      ▒
     0.10%  iperf3           libiperf.so.0.0.0           [.] iperf_tcp_recv                                                                                   ▒
     0.10%  iperf3           [kernel.kallsyms]           [k] avc_lookup ▒
     0.10%  iperf3           [kernel.kallsyms]           [k] ktime_get  ▒
     0.10%  iperf3           [kernel.kallsyms]           [k] receive_buf▒
     0.10%  iperf3           libc.so.6                   [.] __select   ▒
     0.09%  iperf3           [kernel.kallsyms]           [k] avc_has_perm                                                                                     ▒
     0.09%  iperf3           [kernel.kallsyms]           [k] __slab_alloc                                                                                     ▒
     0.08%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock_irqsave                                                                           ▒
     0.08%  swapper          [kernel.kallsyms]           [k] tick_nohz_idle_exit                                                                              ▒
     0.08%  iperf3           [kernel.kallsyms]           [k] ip_finish_output2                                                                                ▒
     0.08%  swapper          [kernel.kallsyms]           [k] default_idle_call                                                                                ▒
     0.08%  swapper          [kernel.kallsyms]           [k] detach_buf_split                                                                                 ▒
     0.08%  swapper          [kernel.kallsyms]           [k] __pv_queued_spin_lock_slowpath                                                                   ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] virtqueue_get_buf                                                                                ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] __pollwait ▒
     0.07%  swapper          [kernel.kallsyms]           [k] virtqueue_get_buf_ctx_split                                                                      ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] __kmalloc_node_track_caller                                                                      ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] should_fail▒
     0.06%  iperf3           [kernel.kallsyms]           [k] virtqueue_add_split                                                                              ▒
     0.06%  iperf3           libiperf.so.0.0.0           [.] iperf_run_server                                                                                 ▒
     0.06%  swapper          [kernel.kallsyms]           [k] __slab_alloc                                                                                     ▒
     0.06%  swapper          [kernel.kallsyms]           [k] kmem_cache_free                                                                                  ▒
     0.06%  swapper          [kernel.kallsyms]           [k] skb_page_frag_refill                                                                             ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] try_fill_recv                                                                                    ▒
     0.06%  swapper          [kernel.kallsyms]           [k] skb_clone  ▒
     0.06%  swapper          [kernel.kallsyms]           [k] do_idle    ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] __sock_wfree                                                                                     ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] start_xmit ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] tcp_rcv_established                                                                              ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] virtqueue_enable_cb_delayed                                                                      ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] tcp_rcv_space_adjust                                                                             ▒
     0.05%  iperf3           libiperf.so.0.0.0           [.] iperf_recv ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] inode_security                                                                                   ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] sock_read_iter                                                                                   ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] vfs_read   ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] __ksize    ▒
     0.05%  sh               [kernel.kallsyms]           [k] do_user_addr_fault                                                                               ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] __dev_queue_xmit                                                                                 ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] fsnotify   ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] selinux_file_permission                                                                          ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] tcp_cleanup_rbuf                                                                                 ▒
     0.05%  swapper          [kernel.kallsyms]           [k] __netif_receive_skb_core.constprop.0                                                             ▒
     0.05%  iperf3           [kernel.kallsyms]           [k] new_sync_read                                                                                    ▒
     0.05%  swapper          [kernel.kallsyms]           [k] __slab_free▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __ip_queue_xmit                                                                                  ▒
     0.04%  iperf3           libiperf.so.0.0.0           [.] tmr_timeout▒
     0.04%  iperf3           libc.so.6                   [.] clock_gettime@@GLIBC_2.17                                                                        ▒
     0.04%  swapper          [kernel.kallsyms]           [k] _raw_spin_lock                                                                                   ▒
     0.04%  tmux: server     [kernel.kallsyms]           [k] do_user_addr_fault                                                                               ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __check_heap_object                                                                              ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] selinux_ip_postroute                                                                             ▒
     0.04%  swapper          [kernel.kallsyms]           [k] __inet_lookup_established                                                                        ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] kmem_cache_alloc_node                                                                            ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __alloc_skb▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __x86_indirect_thunk_rax                                                                         ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] do_syscall_64                                                                                    ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] fput_many  ▒
     0.04%  iperf3           [kernel.kallsyms]           [k] ktime_get_ts64                                                                                   ▒
     0.04%  sh               [kernel.kallsyms]           [k] __softirqentry_text_start                                                                        ▒
     0.04%  swapper          [kernel.kallsyms]           [k] tcp_v4_rcv ▒
     0.03%  iperf3           [kernel.kallsyms]           [k] detach_buf_split                                                                                 ▒
     0.03%  iperf3           [kernel.kallsyms]           [k] get_page_from_freelist                                                                           ▒
     0.03%  iperf3           [kernel.kallsyms]           [k] put_cpu_partial                                                                                  ▒
     0.03%  iperf3           [kernel.kallsyms]           [k] sg_init_table                                                                                    ▒
     0.03%  swapper          [kernel.kallsyms]           [k] page_to_skb▒
     0.03%  swapper          [kernel.kallsyms]           [k] pvclock_clocksource_read                                                                         ▒
     0.03%  swapper          [kernel.kallsyms]           [k] tick_nohz_idle_enter                                                                             ▒
  1. tfw-5.10.35
Samples: 75K of event 'cpu-clock:pppH', Event count (approx.): 18907500000                                                                                     
  Overhead  Command          Shared Object               Symbol          
-   65.14%  swapper          [kernel.kallsyms]           [k] native_safe_halt                                                                                 ▒
     native_safe_halt                                                   ▒
     default_idle                                                       ▒
     default_idle_call                                                  ▒
     do_idle                                                            ◆
   - cpu_startup_entry                                                  ▒
        46.52% secondary_startup_64_no_verify                           ▒
      - 18.62% start_kernel                                             ▒
           secondary_startup_64_no_verify                               ▒
-    6.56%  iperf3           [kernel.kallsyms]           [k] copy_user_enhanced_fast_string                                                                   ▒
   - copy_user_enhanced_fast_string                                     ▒
      - 6.51% copyout                                                   ▒
           _copy_to_iter                                                ▒
         - __skb_datagram_iter                                          ▒
            - 4.40% __skb_datagram_iter                                 ▒
                 skb_copy_datagram_iter                                 ▒
                 tcp_recvmsg                                            ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
                 read                                                   ▒
            - 2.11% skb_copy_datagram_iter                              ▒
                 tcp_recvmsg                                            ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
                 read                                                   ▒
-    3.63%  iperf3           [kernel.kallsyms]           [k] copy_user_generic_unrolled                                                                       ▒
   - copy_user_generic_unrolled                                         ▒
      - 3.57% copyout                                                   ▒
           _copy_to_iter                                                ▒
         - __skb_datagram_iter                                          ▒
            - 2.28% __skb_datagram_iter                                 ▒
                 skb_copy_datagram_iter                                 ▒
                 tcp_recvmsg                                            ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
                 read                                                   ▒
            - 1.29% skb_copy_datagram_iter                              ▒
                 tcp_recvmsg                                            ▒
                 inet6_recvmsg                                          ▒
                 sock_read_iter                                         ▒
                 new_sync_read                                          ▒
                 vfs_read                                               ▒
                 ksys_read                                              ▒
                 do_syscall_64                                          ▒
                 entry_SYSCALL_64_after_hwframe                         ▒
                 read                                                   ▒
+    1.75%  iperf3           [kernel.kallsyms]           [k] __check_object_size                                                                              ▒
+    1.27%  iperf3           [kernel.kallsyms]           [k] _raw_spin_unlock_irqrestore                                                                      ▒
+    1.24%  iperf3           [kernel.kallsyms]           [k] __skb_datagram_iter                                                                              ▒
+    1.00%  iperf3           [kernel.kallsyms]           [k] __softirqentry_text_start                                                                        ▒
+    0.96%  iperf3           [kernel.kallsyms]           [k] free_unref_page                                                                                  ▒
+    0.88%  iperf3           [kernel.kallsyms]           [k] skb_release_data                                                                                 ▒
+    0.73%  iperf3           [kernel.kallsyms]           [k] __virt_addr_valid                                                                                ▒
+    0.66%  iperf3           [kernel.kallsyms]           [k] _copy_to_iter                                                                                    ▒
+    0.62%  swapper          [kernel.kallsyms]           [k] __softirqentry_text_start                                                                        ▒
+    0.52%  swapper          [kernel.kallsyms]           [k] finish_task_switch                                                                               ▒
     0.46%  iperf3           [kernel.kallsyms]           [k] __free_pages_ok                                                                                  ▒
     0.40%  iperf3           [kernel.kallsyms]           [k] copyout    ▒
     0.39%  iperf3           [kernel.kallsyms]           [k] finish_task_switch                                                                               ▒
     0.34%  swapper          [kernel.kallsyms]           [k] __alloc_skb▒
     0.30%  swapper          [kernel.kallsyms]           [k] _raw_spin_unlock_irqrestore                                                                      ▒
     0.29%  iperf3           [ip_tables]                 [k] ipt_do_table                                                                                     ▒
     0.26%  swapper          [kernel.kallsyms]           [k] pg_skb_alloc                                                                                     ▒
     0.24%  iperf3           [kernel.kallsyms]           [k] iowrite16  ▒
     0.23%  iperf3           [kernel.kallsyms]           [k] tcp_poll   ▒
     0.23%  swapper          [kernel.kallsyms]           [k] skb_gro_receive                                                                                  ▒
     0.23%  swapper          [kernel.kallsyms]           [k] get_page_from_freelist                                                                           ▒
     0.21%  swapper          [virtio_net]                [k] page_to_skb▒
     0.20%  iperf3           [kernel.kallsyms]           [k] sock_poll  ▒
     0.20%  iperf3           [kernel.kallsyms]           [k] do_select  ▒
     0.19%  iperf3           [kernel.kallsyms]           [k] __alloc_skb▒
     0.19%  iperf3           [kernel.kallsyms]           [k] pvclock_clocksource_read                                                                         ▒
     0.18%  swapper          [virtio_net]                [k] receive_buf▒
     0.18%  iperf3           [kernel.kallsyms]           [k] syscall_enter_from_user_mode                                                                     ▒
     0.17%  swapper          [kernel.kallsyms]           [k] tick_nohz_idle_exit                                                                              ▒
     0.17%  swapper          [kernel.kallsyms]           [k] dev_gro_receive                                                                                  ▒
     0.15%  iperf3           libc.so.6                   [.] read       ▒
     0.15%  swapper          [virtio_net]                [k] try_fill_recv                                                                                    ▒
     0.14%  iperf3           [nf_conntrack]              [k] __nf_conntrack_find_get                                                                          ▒
     0.14%  iperf3           [kernel.kallsyms]           [k] __pv_queued_spin_lock_slowpath                                                                   ▒
     0.14%  iperf3           [kernel.kallsyms]           [k] tcp_recvmsg▒
     0.13%  swapper          [kernel.kallsyms]           [k] detach_buf_split                                                                                 ▒
     0.13%  iperf3           [kernel.kallsyms]           [k] __fget_light                                                                                     ▒
     0.12%  iperf3           [kernel.kallsyms]           [k] should_fail▒
     0.12%  iperf3           [kernel.kallsyms]           [k] __raw_callee_save___pv_queued_spin_unlock                                                        ▒
     0.12%  iperf3           [vdso]                      [.] __vdso_clock_gettime                                                                             ▒
     0.12%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock_irqsave                                                                           ▒
     0.11%  iperf3           [kernel.kallsyms]           [k] pg_skb_alloc                                                                                     ▒
     0.11%  iperf3           libc.so.6                   [.] __select   ▒
     0.10%  swapper          [kernel.kallsyms]           [k] default_idle_call                                                                                ▒
     0.10%  swapper          [kernel.kallsyms]           [k] iowrite16  ▒
     0.10%  swapper          [kernel.kallsyms]           [k] memcpy_erms▒
     0.10%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock_bh                                                                                ▒
     0.10%  swapper          [kernel.kallsyms]           [k] inet_gro_receive                                                                                 ▒
     0.09%  iperf3           [kernel.kallsyms]           [k] __tcp_transmit_skb                                                                               ▒
     0.09%  iperf3           [virtio_net]                [k] page_to_skb▒
     0.09%  swapper          [kernel.kallsyms]           [k] tcp_gro_receive                                                                                  ▒
     0.09%  swapper          [kernel.kallsyms]           [k] do_idle    ▒
     0.08%  iperf3           [kernel.kallsyms]           [k] _raw_spin_lock                                                                                   ▒
     0.08%  iperf3           [kernel.kallsyms]           [k] dev_gro_receive                                                                                  ▒
     0.08%  iperf3           [kernel.kallsyms]           [k] simple_copy_to_iter                                                                              ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] get_page_from_freelist                                                                           ▒
     0.07%  swapper          [kernel.kallsyms]           [k] napi_skb_free_stolen_head                                                                        ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] ip_send_check                                                                                    ▒
     0.07%  iperf3           [kernel.kallsyms]           [k] sock_rfree ▒
     0.07%  iperf3           [virtio_net]                [k] receive_buf▒
     0.07%  iperf3           [kernel.kallsyms]           [k] virtqueue_add_split                                                                              ▒
     0.06%  swapper          [kernel.kallsyms]           [k] virtqueue_add_split                                                                              ▒
     0.06%  iperf3           libiperf.so.0.0.0           [.] iperf_recv ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] kfree_skbmem                                                                                     ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] kmem_cache_free                                                                                  ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] run_rebalance_domains                                                                            ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] virtqueue_kick_prepare                                                                           ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] skb_gro_receive                                                                                  ▒
     0.06%  iperf3           [kernel.kallsyms]           [k] __put_page▒
     0.06%  iperf3           [kernel.kallsyms]           [k] ip_finish_output2▒
     0.06%  swapper          [kernel.kallsyms]           [k] virtqueue_get_buf_ctx_split▒
     0.05%  iperf3           [kernel.kallsyms]           [k] page_frag_free▒
     0.05%  iperf3           [kernel.kallsyms]           [k] virtqueue_get_buf_ctx_split▒
     0.05%  iperf3           [nf_conntrack]              [k] nf_conntrack_tcp_packet▒
     0.05%  swapper          [ip_tables]                 [k] ipt_do_table▒
     0.05%  iperf3           [kernel.kallsyms]           [k] detach_buf_split▒
     0.05%  sh               [kernel.kallsyms]           [k] __softirqentry_text_start▒
     0.05%  iperf3           [kernel.kallsyms]           [k] _cond_resched▒
     0.05%  iperf3           [kernel.kallsyms]           [k] check_stack_object▒
     0.05%  iperf3           [virtio_net]                [k] try_fill_recv▒
     0.05%  iperf3           [nf_conntrack]              [k] nf_conntrack_in▒
     0.05%  sh               [kernel.kallsyms]           [k] do_user_addr_fault▒
     0.05%  iperf3           [kernel.kallsyms]           [k] ktime_get_ts64▒
     0.05%  iperf3           [kernel.kallsyms]           [k] __pollwait▒
     0.05%  iperf3           [kernel.kallsyms]           [k] fput_many▒
     0.05%  iperf3           [kernel.kallsyms]           [k] ktime_get▒
     0.05%  iperf3           libiperf.so.0.0.0           [.] iperf_run_server▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __slab_free▒
     0.04%  iperf3           [kernel.kallsyms]           [k] tcp_rcv_established▒
     0.04%  swapper          [kernel.kallsyms]           [k] update_sd_lb_stats.constprop.0▒
     0.04%  iperf3           [kernel.kallsyms]           [k] tcp_gro_receive▒
     0.04%  swapper          [kernel.kallsyms]           [k] tick_nohz_idle_enter▒
     0.04%  iperf3           [kernel.kallsyms]           [k] __page_cache_release▒
     0.04%  iperf3           [kernel.kallsyms]           [k] core_sys_select▒
     0.04%  iperf3           libiperf.so.0.0.0           [.] iperf_tcp_recv▒

Testing

On VM: iperf3 -s On host: iperf3 -c <VM ip addr> -P 4

s0nx avatar Apr 10 '23 14:04 s0nx

Probably not linked with the iperf issue, but https://github.com/tempesta-tech/tempesta/pull/1845#discussion_r1170339773 reveals that we create too many 128-byte fragments for HTTP/2 frame headers, which require just 9 bytes. Probably we should introduce pure 9-byte cache allocator for this.

Another performance point is that we use ss_skb_alloc(), which doesn't use the sock->sk_tx_skb_cache cache as sk_stream_alloc_skb() does. Probably this doesn't affect the iperf results, but it still should be fixed.

krizhanovsky avatar Apr 21 '23 19:04 krizhanovsky

I can' reproduce it on out server. For example for vanilla linux i have:

Connecting to host 94.242.233.28, port 5201
[  5] local 94.242.233.20 port 37698 connected to 94.242.233.28 port 5201
[  7] local 94.242.233.20 port 37700 connected to 94.242.233.28 port 5201
[  9] local 94.242.233.20 port 37702 connected to 94.242.233.28 port 5201
[ 11] local 94.242.233.20 port 37704 connected to 94.242.233.28 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   442 MBytes  3.71 Gbits/sec    5    839 KBytes       
[  7]   0.00-1.00   sec   442 MBytes  3.70 Gbits/sec    0    515 KBytes       
[  9]   0.00-1.00   sec   439 MBytes  3.68 Gbits/sec    0    423 KBytes       
[ 11]   0.00-1.00   sec   438 MBytes  3.68 Gbits/sec    0    366 KBytes       
[SUM]   0.00-1.00   sec  1.72 GBytes  14.8 Gbits/sec    5             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   446 MBytes  3.74 Gbits/sec    0    839 KBytes       
[  7]   1.00-2.00   sec   446 MBytes  3.74 Gbits/sec    0    515 KBytes       
[  9]   1.00-2.00   sec   447 MBytes  3.75 Gbits/sec    0    423 KBytes       
[ 11]   1.00-2.00   sec   447 MBytes  3.75 Gbits/sec    0    457 KBytes       
[SUM]   1.00-2.00   sec  1.74 GBytes  15.0 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   444 MBytes  3.72 Gbits/sec    0    839 KBytes       
[  7]   2.00-3.00   sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   2.00-3.00   sec   443 MBytes  3.72 Gbits/sec    0    423 KBytes       
[ 11]   2.00-3.00   sec   443 MBytes  3.72 Gbits/sec    0    457 KBytes       
[SUM]   2.00-3.00   sec  1.73 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   445 MBytes  3.73 Gbits/sec    0    839 KBytes       
[  7]   3.00-4.00   sec   444 MBytes  3.72 Gbits/sec    0    515 KBytes       
[  9]   3.00-4.00   sec   445 MBytes  3.73 Gbits/sec    0    423 KBytes       
[ 11]   3.00-4.00   sec   445 MBytes  3.73 Gbits/sec    0    457 KBytes       
[SUM]   3.00-4.00   sec  1.74 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   445 MBytes  3.73 Gbits/sec    0    839 KBytes       
[  7]   4.00-5.00   sec   445 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   4.00-5.00   sec   444 MBytes  3.73 Gbits/sec    0    423 KBytes       
[ 11]   4.00-5.00   sec   444 MBytes  3.73 Gbits/sec    0    457 KBytes       
[SUM]   4.00-5.00   sec  1.74 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   444 MBytes  3.72 Gbits/sec    0    839 KBytes       
[  7]   5.00-6.00   sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   5.00-6.00   sec   444 MBytes  3.73 Gbits/sec    0    423 KBytes       
[ 11]   5.00-6.00   sec   445 MBytes  3.73 Gbits/sec    0    457 KBytes       
[SUM]   5.00-6.00   sec  1.74 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   444 MBytes  3.72 Gbits/sec    0    839 KBytes       
[  7]   6.00-7.00   sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   6.00-7.00   sec   444 MBytes  3.73 Gbits/sec    0    423 KBytes       
[ 11]   6.00-7.00   sec   444 MBytes  3.72 Gbits/sec    0    457 KBytes       
[SUM]   6.00-7.00   sec  1.73 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   444 MBytes  3.72 Gbits/sec    0    839 KBytes       
[  7]   7.00-8.00   sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   7.00-8.00   sec   444 MBytes  3.72 Gbits/sec    0    423 KBytes       
[ 11]   7.00-8.00   sec   444 MBytes  3.72 Gbits/sec    0    457 KBytes       
[SUM]   7.00-8.00   sec  1.73 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   445 MBytes  3.73 Gbits/sec    0    839 KBytes       
[  7]   8.00-9.00   sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   8.00-9.00   sec   445 MBytes  3.73 Gbits/sec    0    423 KBytes       
[ 11]   8.00-9.00   sec   444 MBytes  3.73 Gbits/sec    0    457 KBytes       
[SUM]   8.00-9.00   sec  1.74 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   444 MBytes  3.72 Gbits/sec    0    839 KBytes       
[  7]   9.00-10.00  sec   444 MBytes  3.73 Gbits/sec    0    515 KBytes       
[  9]   9.00-10.00  sec   444 MBytes  3.72 Gbits/sec    0    423 KBytes       
[ 11]   9.00-10.00  sec   444 MBytes  3.72 Gbits/sec    0    457 KBytes       
[SUM]   9.00-10.00  sec  1.73 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.34 GBytes  3.73 Gbits/sec    5             sender
[  5]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  4.34 GBytes  3.73 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec    0             sender
[  9]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec    0             sender
[ 11]   0.00-10.00  sec  4.33 GBytes  3.72 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  17.3 GBytes  14.9 Gbits/sec    5             sender
[SUM]   0.00-10.00  sec  17.3 GBytes  14.9 Gbits/sec                  receiver

and for Tempesta kernel i have:

Connecting to host 94.242.233.28, port 5201
[  5] local 94.242.233.20 port 35410 connected to 94.242.233.28 port 5201
[  7] local 94.242.233.20 port 35412 connected to 94.242.233.28 port 5201
[  9] local 94.242.233.20 port 35414 connected to 94.242.233.28 port 5201
[ 11] local 94.242.233.20 port 35416 connected to 94.242.233.28 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   429 MBytes  3.60 Gbits/sec    0    604 KBytes       
[  7]   0.00-1.00   sec   428 MBytes  3.59 Gbits/sec    0    974 KBytes       
[  9]   0.00-1.00   sec   424 MBytes  3.56 Gbits/sec    3   1.33 MBytes       
[ 11]   0.00-1.00   sec   428 MBytes  3.59 Gbits/sec    0   1003 KBytes       
[SUM]   0.00-1.00   sec  1.67 GBytes  14.3 Gbits/sec    3             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   421 MBytes  3.54 Gbits/sec    0    604 KBytes       
[  7]   1.00-2.00   sec   421 MBytes  3.53 Gbits/sec    0    974 KBytes       
[  9]   1.00-2.00   sec   421 MBytes  3.53 Gbits/sec    0   1.33 MBytes       
[ 11]   1.00-2.00   sec   421 MBytes  3.53 Gbits/sec    0   1003 KBytes       
[SUM]   1.00-2.00   sec  1.65 GBytes  14.1 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   422 MBytes  3.54 Gbits/sec    0    604 KBytes       
[  7]   2.00-3.00   sec   422 MBytes  3.54 Gbits/sec    0    974 KBytes       
[  9]   2.00-3.00   sec   422 MBytes  3.54 Gbits/sec    0   1.33 MBytes       
[ 11]   2.00-3.00   sec   421 MBytes  3.53 Gbits/sec    0   1003 KBytes       
[SUM]   2.00-3.00   sec  1.65 GBytes  14.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   422 MBytes  3.54 Gbits/sec    0    604 KBytes       
[  7]   3.00-4.00   sec   422 MBytes  3.54 Gbits/sec    0    974 KBytes       
[  9]   3.00-4.00   sec   422 MBytes  3.54 Gbits/sec    0   1.33 MBytes       
[ 11]   3.00-4.00   sec   422 MBytes  3.54 Gbits/sec    0   1003 KBytes       
[SUM]   3.00-4.00   sec  1.65 GBytes  14.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   413 MBytes  3.46 Gbits/sec    5    604 KBytes       
[  7]   4.00-5.00   sec   411 MBytes  3.45 Gbits/sec    0    974 KBytes       
[  9]   4.00-5.00   sec   411 MBytes  3.45 Gbits/sec    0   1.33 MBytes       
[ 11]   4.00-5.00   sec   412 MBytes  3.46 Gbits/sec    0   1003 KBytes       
[SUM]   4.00-5.00   sec  1.61 GBytes  13.8 Gbits/sec    5             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   412 MBytes  3.45 Gbits/sec    0    604 KBytes       
[  7]   5.00-6.00   sec   412 MBytes  3.46 Gbits/sec    0    974 KBytes       
[  9]   5.00-6.00   sec   411 MBytes  3.45 Gbits/sec    0   1.33 MBytes       
[ 11]   5.00-6.00   sec   411 MBytes  3.45 Gbits/sec    0   1003 KBytes       
[SUM]   5.00-6.00   sec  1.61 GBytes  13.8 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   413 MBytes  3.47 Gbits/sec    0    604 KBytes       
[  7]   6.00-7.00   sec   414 MBytes  3.47 Gbits/sec    0    974 KBytes       
[  9]   6.00-7.00   sec   414 MBytes  3.47 Gbits/sec    0   1.33 MBytes       
[ 11]   6.00-7.00   sec   414 MBytes  3.47 Gbits/sec    0   1003 KBytes       
[SUM]   6.00-7.00   sec  1.62 GBytes  13.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   414 MBytes  3.47 Gbits/sec    0    604 KBytes       
[  7]   7.00-8.00   sec   412 MBytes  3.46 Gbits/sec    0    974 KBytes       
[  9]   7.00-8.00   sec   414 MBytes  3.47 Gbits/sec    0   1.33 MBytes       
[ 11]   7.00-8.00   sec   414 MBytes  3.47 Gbits/sec    0   1003 KBytes       
[SUM]   7.00-8.00   sec  1.62 GBytes  13.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   413 MBytes  3.46 Gbits/sec    0    604 KBytes       
[  7]   8.00-9.00   sec   414 MBytes  3.47 Gbits/sec    0    974 KBytes       
[  9]   8.00-9.00   sec   414 MBytes  3.47 Gbits/sec    0   1.33 MBytes       
[ 11]   8.00-9.00   sec   414 MBytes  3.47 Gbits/sec    0   1003 KBytes       
[SUM]   8.00-9.00   sec  1.62 GBytes  13.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   414 MBytes  3.47 Gbits/sec    0    604 KBytes       
[  7]   9.00-10.00  sec   414 MBytes  3.47 Gbits/sec    0    974 KBytes       
[  9]   9.00-10.00  sec   414 MBytes  3.47 Gbits/sec    0   1.33 MBytes       
[ 11]   9.00-10.00  sec   414 MBytes  3.47 Gbits/sec    0   1003 KBytes       
[SUM]   9.00-10.00  sec  1.62 GBytes  13.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.07 GBytes  3.50 Gbits/sec    5             sender
[  5]   0.00-10.00  sec  4.07 GBytes  3.49 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  4.07 GBytes  3.50 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  4.07 GBytes  3.49 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  4.07 GBytes  3.50 Gbits/sec    3             sender
[  9]   0.00-10.00  sec  4.06 GBytes  3.49 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  4.07 GBytes  3.50 Gbits/sec    0             sender
[ 11]   0.00-10.00  sec  4.07 GBytes  3.49 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec    8             sender
[SUM]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec                  receive

Full results are stored on our server in /home/evgenii_mekhanik/IPERF folder.

EvgeniiMekhanik avatar Mar 20 '24 15:03 EvgeniiMekhanik

Melanox Tempesta kernel:

[  5]   0.00-1.00   sec   903 MBytes  7.57 Gbits/sec    0    906 KBytes       
[  7]   0.00-1.00   sec   901 MBytes  7.55 Gbits/sec    0    906 KBytes       
[  9]   0.00-1.00   sec   901 MBytes  7.55 Gbits/sec    0    723 KBytes       
[ 11]   0.00-1.00   sec   905 MBytes  7.59 Gbits/sec    0    810 KBytes       
[SUM]   0.00-1.00   sec  3.53 GBytes  30.3 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec    0    964 KBytes       
[  7]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec    0    906 KBytes       
[  9]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec    0    773 KBytes       
[ 11]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec    0    895 KBytes       
[SUM]   1.00-2.00   sec  4.77 GBytes  41.0 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  1.17 GBytes  10.1 Gbits/sec    0    964 KBytes       
[  7]   2.00-3.00   sec  1.17 GBytes  10.1 Gbits/sec    0    906 KBytes       
[  9]   2.00-3.00   sec  1.17 GBytes  10.1 Gbits/sec    0    773 KBytes       
[ 11]   2.00-3.00   sec  1.17 GBytes  10.1 Gbits/sec    0    895 KBytes       
[SUM]   2.00-3.00   sec  4.69 GBytes  40.3 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec    0    964 KBytes       
[  7]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec    0    906 KBytes       
[  9]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec    0    773 KBytes       
[ 11]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec    0    895 KBytes       
[SUM]   3.00-4.00   sec  4.71 GBytes  40.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  1.17 GBytes  10.1 Gbits/sec    0   1.00 MBytes       
[  7]   4.00-5.00   sec  1.17 GBytes  10.0 Gbits/sec    0    906 KBytes       
[  9]   4.00-5.00   sec  1.17 GBytes  10.1 Gbits/sec    0    834 KBytes       
[ 11]   4.00-5.00   sec  1.17 GBytes  10.1 Gbits/sec    0    939 KBytes       
[SUM]   4.00-5.00   sec  4.68 GBytes  40.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  1.32 GBytes  11.3 Gbits/sec    0   1.00 MBytes       
[  7]   5.00-6.00   sec  1.32 GBytes  11.3 Gbits/sec    0    906 KBytes       
[  9]   5.00-6.00   sec  1.32 GBytes  11.3 Gbits/sec    0    834 KBytes       
[ 11]   5.00-6.00   sec  1.32 GBytes  11.3 Gbits/sec    0    939 KBytes       
[SUM]   5.00-6.00   sec  5.27 GBytes  45.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  1.29 GBytes  11.1 Gbits/sec    0   1.00 MBytes       
[  7]   6.00-7.00   sec  1.29 GBytes  11.1 Gbits/sec    0    906 KBytes       
[  9]   6.00-7.00   sec  1.29 GBytes  11.1 Gbits/sec    0    880 KBytes       
[ 11]   6.00-7.00   sec  1.29 GBytes  11.1 Gbits/sec    0    939 KBytes       
[SUM]   6.00-7.00   sec  5.17 GBytes  44.4 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec    0   1.00 MBytes       
[  7]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec    0    906 KBytes       
[  9]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec    0    880 KBytes       
[ 11]   7.00-8.00   sec  1.28 GBytes  11.0 Gbits/sec    0    939 KBytes       
[SUM]   7.00-8.00   sec  5.12 GBytes  44.0 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0   1.00 MBytes       
[  7]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  9]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    880 KBytes       
[ 11]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    939 KBytes       
[SUM]   8.00-9.00   sec  5.04 GBytes  43.4 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0   1.00 MBytes       
[  7]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  9]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    880 KBytes       
[ 11]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    939 KBytes       
[SUM]   9.00-10.00  sec  5.03 GBytes  43.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[  7]   0.00-10.04  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[  9]   0.00-10.04  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[ 11]   0.00-10.04  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  48.0 GBytes  41.2 Gbits/sec    0             sender
[SUM]   0.00-10.04  sec  48.0 GBytes  41.1 Gbits/sec                  receiver

EvgeniiMekhanik avatar Mar 28 '24 12:03 EvgeniiMekhanik

Melanox plain kernel:

[  5] local 192.168.253.106 port 38554 connected to 192.168.253.105 port 5201
[  7] local 192.168.253.106 port 38570 connected to 192.168.253.105 port 5201
[  9] local 192.168.253.106 port 38582 connected to 192.168.253.105 port 5201
[ 11] local 192.168.253.106 port 38596 connected to 192.168.253.105 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   882 MBytes  7.40 Gbits/sec    0    906 KBytes       
[  7]   0.00-1.00   sec   881 MBytes  7.39 Gbits/sec    0    714 KBytes       
[  9]   0.00-1.00   sec   881 MBytes  7.39 Gbits/sec    0    670 KBytes       
[ 11]   0.00-1.00   sec   881 MBytes  7.39 Gbits/sec    0    604 KBytes       
[SUM]   0.00-1.00   sec  3.44 GBytes  29.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  1.07 GBytes  9.20 Gbits/sec    0    906 KBytes       
[  7]   1.00-2.00   sec  1.07 GBytes  9.21 Gbits/sec    0    714 KBytes       
[  9]   1.00-2.00   sec  1.07 GBytes  9.20 Gbits/sec    0    670 KBytes       
[ 11]   1.00-2.00   sec  1.07 GBytes  9.20 Gbits/sec    0    604 KBytes       
[SUM]   1.00-2.00   sec  4.28 GBytes  36.8 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  1.30 GBytes  11.2 Gbits/sec    0    906 KBytes       
[  7]   2.00-3.00   sec  1.30 GBytes  11.2 Gbits/sec    0    714 KBytes       
[  9]   2.00-3.00   sec  1.30 GBytes  11.2 Gbits/sec    0    670 KBytes       
[ 11]   2.00-3.00   sec  1.30 GBytes  11.2 Gbits/sec    0    645 KBytes       
[SUM]   2.00-3.00   sec  5.21 GBytes  44.8 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  1.31 GBytes  11.2 Gbits/sec    0    906 KBytes       
[  7]   3.00-4.00   sec  1.31 GBytes  11.2 Gbits/sec    0    714 KBytes       
[  9]   3.00-4.00   sec  1.31 GBytes  11.2 Gbits/sec    0    670 KBytes       
[ 11]   3.00-4.00   sec  1.31 GBytes  11.2 Gbits/sec    0    645 KBytes       
[SUM]   3.00-4.00   sec  5.23 GBytes  44.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  1.32 GBytes  11.3 Gbits/sec    0    906 KBytes       
[  7]   4.00-5.00   sec  1.32 GBytes  11.3 Gbits/sec    0    809 KBytes       
[  9]   4.00-5.00   sec  1.32 GBytes  11.3 Gbits/sec    0    670 KBytes       
[ 11]   4.00-5.00   sec  1.32 GBytes  11.3 Gbits/sec    0    771 KBytes       
[SUM]   4.00-5.00   sec  5.27 GBytes  45.3 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  7]   5.00-6.00   sec  1.26 GBytes  10.8 Gbits/sec    0    809 KBytes       
[  9]   5.00-6.00   sec  1.26 GBytes  10.8 Gbits/sec    0    670 KBytes       
[ 11]   5.00-6.00   sec  1.26 GBytes  10.8 Gbits/sec    0    771 KBytes       
[SUM]   5.00-6.00   sec  5.02 GBytes  43.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  7]   6.00-7.00   sec  1.26 GBytes  10.8 Gbits/sec    0    809 KBytes       
[  9]   6.00-7.00   sec  1.26 GBytes  10.8 Gbits/sec    0    670 KBytes       
[ 11]   6.00-7.00   sec  1.26 GBytes  10.8 Gbits/sec    0    771 KBytes       
[SUM]   6.00-7.00   sec  5.05 GBytes  43.4 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  1.27 GBytes  10.9 Gbits/sec    0    906 KBytes       
[  7]   7.00-8.00   sec  1.27 GBytes  10.9 Gbits/sec    0    809 KBytes       
[  9]   7.00-8.00   sec  1.27 GBytes  10.9 Gbits/sec    0    670 KBytes       
[ 11]   7.00-8.00   sec  1.27 GBytes  10.9 Gbits/sec    0    771 KBytes       
[SUM]   7.00-8.00   sec  5.08 GBytes  43.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  7]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    809 KBytes       
[  9]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    670 KBytes       
[ 11]   8.00-9.00   sec  1.26 GBytes  10.8 Gbits/sec    0    771 KBytes       
[SUM]   8.00-9.00   sec  5.03 GBytes  43.2 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    906 KBytes       
[  7]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    809 KBytes       
[  9]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    670 KBytes       
[ 11]   9.00-10.00  sec  1.26 GBytes  10.8 Gbits/sec    0    771 KBytes       
[SUM]   9.00-10.00  sec  5.04 GBytes  43.3 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.2 GBytes  10.4 Gbits/sec    0             sender
[  5]   0.00-10.03  sec  12.2 GBytes  10.4 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  12.2 GBytes  10.4 Gbits/sec    0             sender
[  7]   0.00-10.03  sec  12.2 GBytes  10.4 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  12.2 GBytes  10.4 Gbits/sec    0             sender
[  9]   0.00-10.03  sec  12.2 GBytes  10.4 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  12.2 GBytes  10.4 Gbits/sec    0             sender
[ 11]   0.00-10.03  sec  12.2 GBytes  10.4 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  48.7 GBytes  41.8 Gbits/sec    0             sender
[SUM]   0.00-10.03  sec  48.6 GBytes  41.6 Gbits/sec                  receiver

EvgeniiMekhanik avatar Mar 28 '24 12:03 EvgeniiMekhanik

I tried to reproduce on my VM. My results:

Client report on 5.10.35 Patched

Connecting to host 192.168.122.127, port 5201
[  5] local 192.168.122.1 port 60634 connected to 192.168.122.127 port 5201
[  7] local 192.168.122.1 port 60636 connected to 192.168.122.127 port 5201
[  9] local 192.168.122.1 port 60648 connected to 192.168.122.127 port 5201
[ 11] local 192.168.122.1 port 60652 connected to 192.168.122.127 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   444 MBytes  3.72 Gbits/sec    6    556 KBytes       
[  7]   0.00-1.00   sec   444 MBytes  3.72 Gbits/sec    0    847 KBytes       
[  9]   0.00-1.00   sec   443 MBytes  3.72 Gbits/sec  219    380 KBytes       
[ 11]   0.00-1.00   sec   442 MBytes  3.70 Gbits/sec  290    351 KBytes       
[SUM]   0.00-1.00   sec  1.73 GBytes  14.9 Gbits/sec  515             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   446 MBytes  3.74 Gbits/sec    0    556 KBytes       
[  7]   1.00-2.00   sec   446 MBytes  3.75 Gbits/sec    0    847 KBytes       
[  9]   1.00-2.00   sec   445 MBytes  3.74 Gbits/sec    0    423 KBytes       
[ 11]   1.00-2.00   sec   445 MBytes  3.74 Gbits/sec    0    410 KBytes       
[SUM]   1.00-2.00   sec  1.74 GBytes  15.0 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   446 MBytes  3.74 Gbits/sec    0    556 KBytes       
[  7]   2.00-3.00   sec   447 MBytes  3.75 Gbits/sec  196    618 KBytes       
[  9]   2.00-3.00   sec   447 MBytes  3.75 Gbits/sec   39    382 KBytes       
[ 11]   2.00-3.00   sec   448 MBytes  3.75 Gbits/sec  127    379 KBytes       
[SUM]   2.00-3.00   sec  1.75 GBytes  15.0 Gbits/sec  362             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   444 MBytes  3.72 Gbits/sec    0    556 KBytes       
[  7]   3.00-4.00   sec   443 MBytes  3.72 Gbits/sec    0    618 KBytes       
[  9]   3.00-4.00   sec   444 MBytes  3.72 Gbits/sec    0    424 KBytes       
[ 11]   3.00-4.00   sec   444 MBytes  3.72 Gbits/sec    0    411 KBytes       
[SUM]   3.00-4.00   sec  1.73 GBytes  14.9 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   446 MBytes  3.75 Gbits/sec    6    433 KBytes       
[  7]   4.00-5.00   sec   447 MBytes  3.75 Gbits/sec    0    618 KBytes       
[  9]   4.00-5.00   sec   447 MBytes  3.75 Gbits/sec    0    447 KBytes       
[ 11]   4.00-5.00   sec   446 MBytes  3.74 Gbits/sec    0    438 KBytes       
[SUM]   4.00-5.00   sec  1.74 GBytes  15.0 Gbits/sec    6             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   446 MBytes  3.74 Gbits/sec    0    450 KBytes       
[  7]   5.00-6.00   sec   446 MBytes  3.74 Gbits/sec    0    618 KBytes       
[  9]   5.00-6.00   sec   446 MBytes  3.74 Gbits/sec    0    462 KBytes       
[ 11]   5.00-6.00   sec   445 MBytes  3.73 Gbits/sec   58    392 KBytes       
[SUM]   5.00-6.00   sec  1.74 GBytes  15.0 Gbits/sec   58             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   446 MBytes  3.75 Gbits/sec    0    468 KBytes       
[  7]   6.00-7.00   sec   446 MBytes  3.74 Gbits/sec   39    438 KBytes       
[  9]   6.00-7.00   sec   446 MBytes  3.75 Gbits/sec    0    478 KBytes       
[ 11]   6.00-7.00   sec   446 MBytes  3.75 Gbits/sec    0    431 KBytes       
[SUM]   6.00-7.00   sec  1.74 GBytes  15.0 Gbits/sec   39             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   445 MBytes  3.73 Gbits/sec    0    477 KBytes       
[  7]   7.00-8.00   sec   446 MBytes  3.74 Gbits/sec   14    385 KBytes       
[  9]   7.00-8.00   sec   447 MBytes  3.74 Gbits/sec    0    485 KBytes       
[ 11]   7.00-8.00   sec   446 MBytes  3.74 Gbits/sec    0    445 KBytes       
[SUM]   7.00-8.00   sec  1.74 GBytes  15.0 Gbits/sec   14             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   426 MBytes  3.57 Gbits/sec  1430    216 KBytes       
[  7]   8.00-9.00   sec   418 MBytes  3.50 Gbits/sec  1454    100 KBytes       
[  9]   8.00-9.00   sec   401 MBytes  3.37 Gbits/sec  1899    279 KBytes       
[ 11]   8.00-9.00   sec   421 MBytes  3.53 Gbits/sec  2821    230 KBytes       
[SUM]   8.00-9.00   sec  1.63 GBytes  14.0 Gbits/sec  7604             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   407 MBytes  3.41 Gbits/sec  2420    375 KBytes       
[  7]   9.00-10.00  sec   316 MBytes  2.65 Gbits/sec  4437    236 KBytes       
[  9]   9.00-10.00  sec   520 MBytes  4.36 Gbits/sec  3439    577 KBytes       
[ 11]   9.00-10.00  sec   402 MBytes  3.37 Gbits/sec  2785    272 KBytes       
[SUM]   9.00-10.00  sec  1.61 GBytes  13.8 Gbits/sec  13081             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.29 GBytes  3.69 Gbits/sec  3862             sender
[  5]   0.00-10.03  sec  4.29 GBytes  3.67 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  4.20 GBytes  3.61 Gbits/sec  6140             sender
[  7]   0.00-10.03  sec  4.19 GBytes  3.59 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  4.38 GBytes  3.76 Gbits/sec  5596             sender
[  9]   0.00-10.03  sec  4.38 GBytes  3.75 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  4.28 GBytes  3.68 Gbits/sec  6081             sender
[ 11]   0.00-10.03  sec  4.28 GBytes  3.66 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  17.2 GBytes  14.7 Gbits/sec  21679             sender
[SUM]   0.00-10.03  sec  17.1 GBytes  14.7 Gbits/sec                  receiver

Client report on 5.10.35 Vanilla

Connecting to host 192.168.122.127, port 5201
[  5] local 192.168.122.1 port 48186 connected to 192.168.122.127 port 5201
[  7] local 192.168.122.1 port 48202 connected to 192.168.122.127 port 5201
[  9] local 192.168.122.1 port 48208 connected to 192.168.122.127 port 5201
[ 11] local 192.168.122.1 port 48214 connected to 192.168.122.127 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   529 MBytes  4.43 Gbits/sec  4942    300 KBytes       
[  7]   0.00-1.00   sec   371 MBytes  3.11 Gbits/sec  4575    242 KBytes       
[  9]   0.00-1.00   sec   460 MBytes  3.85 Gbits/sec  6545    561 KBytes       
[ 11]   0.00-1.00   sec   392 MBytes  3.29 Gbits/sec  3489    354 KBytes       
[SUM]   0.00-1.00   sec  1.71 GBytes  14.7 Gbits/sec  19551             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   508 MBytes  4.26 Gbits/sec  866    421 KBytes       
[  7]   1.00-2.00   sec   363 MBytes  3.05 Gbits/sec  615    242 KBytes       
[  9]   1.00-2.00   sec   312 MBytes  2.62 Gbits/sec  243    338 KBytes       
[ 11]   1.00-2.00   sec   462 MBytes  3.87 Gbits/sec  604    338 KBytes       
[SUM]   1.00-2.00   sec  1.61 GBytes  13.8 Gbits/sec  2328             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   490 MBytes  4.11 Gbits/sec  4066    205 KBytes       
[  7]   2.00-3.00   sec   414 MBytes  3.47 Gbits/sec  1845    324 KBytes       
[  9]   2.00-3.00   sec   451 MBytes  3.78 Gbits/sec  880    505 KBytes       
[ 11]   2.00-3.00   sec   356 MBytes  2.98 Gbits/sec  2506    410 KBytes       
[SUM]   2.00-3.00   sec  1.67 GBytes  14.4 Gbits/sec  9297             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   381 MBytes  3.19 Gbits/sec  461    366 KBytes       
[  7]   3.00-4.00   sec   380 MBytes  3.19 Gbits/sec  357    329 KBytes       
[  9]   3.00-4.00   sec   401 MBytes  3.36 Gbits/sec  528    344 KBytes       
[ 11]   3.00-4.00   sec   376 MBytes  3.15 Gbits/sec  117    403 KBytes       
[SUM]   3.00-4.00   sec  1.50 GBytes  12.9 Gbits/sec  1463             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   459 MBytes  3.85 Gbits/sec  2369    416 KBytes       
[  7]   4.00-5.00   sec   384 MBytes  3.22 Gbits/sec  1963    399 KBytes       
[  9]   4.00-5.00   sec   407 MBytes  3.41 Gbits/sec  2996    484 KBytes       
[ 11]   4.00-5.00   sec   484 MBytes  4.06 Gbits/sec  2815    551 KBytes       
[SUM]   4.00-5.00   sec  1.69 GBytes  14.6 Gbits/sec  10143             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   402 MBytes  3.37 Gbits/sec    5    452 KBytes       
[  7]   5.00-6.00   sec   391 MBytes  3.28 Gbits/sec   10    424 KBytes       
[  9]   5.00-6.00   sec   329 MBytes  2.76 Gbits/sec   70    376 KBytes       
[ 11]   5.00-6.00   sec   409 MBytes  3.43 Gbits/sec   35    460 KBytes       
[SUM]   5.00-6.00   sec  1.49 GBytes  12.8 Gbits/sec  120             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   348 MBytes  2.92 Gbits/sec  328    464 KBytes       
[  7]   6.00-7.00   sec   346 MBytes  2.90 Gbits/sec  581    318 KBytes       
[  9]   6.00-7.00   sec   345 MBytes  2.89 Gbits/sec  437    372 KBytes       
[ 11]   6.00-7.00   sec   344 MBytes  2.88 Gbits/sec  231    365 KBytes       
[SUM]   6.00-7.00   sec  1.35 GBytes  11.6 Gbits/sec  1577             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   337 MBytes  2.83 Gbits/sec  3503    199 KBytes       
[  7]   7.00-8.00   sec   451 MBytes  3.79 Gbits/sec  2932    390 KBytes       
[  9]   7.00-8.00   sec   372 MBytes  3.12 Gbits/sec  2976    304 KBytes       
[ 11]   7.00-8.00   sec   492 MBytes  4.12 Gbits/sec  3257    580 KBytes       
[SUM]   7.00-8.00   sec  1.61 GBytes  13.9 Gbits/sec  12668             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   404 MBytes  3.39 Gbits/sec  795    356 KBytes       
[  7]   8.00-9.00   sec   360 MBytes  3.02 Gbits/sec  964    218 KBytes       
[  9]   8.00-9.00   sec   380 MBytes  3.19 Gbits/sec  1074    301 KBytes       
[ 11]   8.00-9.00   sec   406 MBytes  3.41 Gbits/sec  649    250 KBytes       
[SUM]   8.00-9.00   sec  1.51 GBytes  13.0 Gbits/sec  3482             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   483 MBytes  4.05 Gbits/sec  5265    395 KBytes       
[  7]   9.00-10.00  sec   451 MBytes  3.78 Gbits/sec  1533    481 KBytes       
[  9]   9.00-10.00  sec   378 MBytes  3.17 Gbits/sec  1962    512 KBytes       
[ 11]   9.00-10.00  sec   427 MBytes  3.59 Gbits/sec  4263    219 KBytes       
[SUM]   9.00-10.00  sec  1.70 GBytes  14.6 Gbits/sec  13023             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.24 GBytes  3.64 Gbits/sec  22600             sender
[  5]   0.00-10.05  sec  4.24 GBytes  3.62 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  3.82 GBytes  3.28 Gbits/sec  15375             sender
[  7]   0.00-10.05  sec  3.82 GBytes  3.26 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  3.75 GBytes  3.22 Gbits/sec  17711             sender
[  9]   0.00-10.05  sec  3.75 GBytes  3.20 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  4.05 GBytes  3.48 Gbits/sec  17966             sender
[ 11]   0.00-10.05  sec  4.05 GBytes  3.46 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  15.9 GBytes  13.6 Gbits/sec  73652             sender
[SUM]   0.00-10.05  sec  15.9 GBytes  13.5 Gbits/sec                  receiver

const-t avatar Mar 29 '24 16:03 const-t

@EvgeniiMekhanik I think the measurement https://github.com/tempesta-tech/tempesta/issues/1863#issuecomment-2009870600 is irrelevant since it uses public, low performance, networking. The ConnectX benchmarks are the right way to test on the hardware testbed. In the env we have 200Gbps connection (dual port NIC with each port of 100Gbps), so could you please redo tests with more iperf3 threads:

  • leave core 0 for system things and the rest of the cores for the becnhmarks
  • the servers have different number of cores, so use the minimum one in your benhcmarks
  • use taskset for CPU affinity (iperf3 seems can bind to one CPU only with its affinity option)
  • please make sure that the servers do not use hyperthreading (if yes, then please disable it in bios)
  • set correct CPU goverrnor and disable C-states (see https://github.com/tempesta-tech/tempesta/wiki/Performance#tips-for-linux-performance-settings)
  • we have Mellanox CX623106A ConnectX-6 Dx EN 100Gigabit Ethernet Card 0F6FXM - Dual port 100G , so please make sure that NVIDIA EN drivers are used

It worth checking whether we have any issues beyond 100Gpbs and on stronger concurrency test.

krizhanovsky avatar Mar 30 '24 21:03 krizhanovsky

It seems that the original benchmark was done on a custom kernel build pv-1703-crash (from #1703 fix), so maybe there is some issues. Also it's pity that there is not kernel config attached. x4 performance difference is very suspicious...

krizhanovsky avatar Mar 30 '24 21:03 krizhanovsky

I can see good performance results on our kernel without using taskset. Step to reproduce on server and generator.

  1. Run /home/evgenii_mekhanik/tun100Gb.sh
  2. Run /home/evgenii_mekhanik/mlnx-tools/sbin/tun100Gb.sh
  3. Disable hyperthreading in BIOS
  4. Run /home/evgenii_mekhanik/iperf-3.13-mt-beta3/src/iperf3

./src/iperf3 -c 192.168.253.106 -P2 -w 2M -Z Connecting to host 192.168.253.106, port 5201 [ 5] local 192.168.253.107 port 39428 connected to 192.168.253.106 port 5201 [ 7] local 192.168.253.107 port 39430 connected to 192.168.253.106 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 5.42 GBytes 46.5 Gbits/sec 0 2.00 MBytes
[ 7] 0.00-1.00 sec 5.42 GBytes 46.5 Gbits/sec 0 2.01 MBytes
[SUM] 0.00-1.00 sec 10.8 GBytes 93.1 Gbits/sec 0

[ 5] 1.00-2.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 1.00-2.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 1.00-2.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ 5] 2.00-3.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 2.00-3.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 2.00-3.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ 5] 3.00-4.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 3.00-4.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 3.00-4.00 sec 11.0 GBytes 94.2 Gbits/sec 0

[ 5] 4.00-5.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 4.00-5.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 4.00-5.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ 5] 5.00-6.00 sec 5.46 GBytes 46.9 Gbits/sec 0 2.00 MBytes
[ 7] 5.00-6.00 sec 5.49 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 5.00-6.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ 5] 6.00-7.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 6.00-7.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 6.00-7.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ 5] 7.00-8.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 7.00-8.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 7.00-8.00 sec 11.0 GBytes 94.2 Gbits/sec 0

[ 5] 8.00-9.00 sec 5.48 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 8.00-9.00 sec 5.46 GBytes 46.9 Gbits/sec 0 2.01 MBytes
[SUM] 8.00-9.00 sec 10.9 GBytes 94.0 Gbits/sec 0

[ 5] 9.00-10.00 sec 5.49 GBytes 47.1 Gbits/sec 0 2.00 MBytes
[ 7] 9.00-10.00 sec 5.49 GBytes 47.1 Gbits/sec 0 2.01 MBytes
[SUM] 9.00-10.00 sec 11.0 GBytes 94.1 Gbits/sec 0

[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 54.7 GBytes 47.0 Gbits/sec 0 sender [ 5] 0.00-10.00 sec 54.7 GBytes 47.0 Gbits/sec receiver [ 7] 0.00-10.00 sec 54.7 GBytes 47.0 Gbits/sec 0 sender [ 7] 0.00-10.00 sec 54.7 GBytes 47.0 Gbits/sec receiver [SUM] 0.00-10.00 sec 109 GBytes 94.0 Gbits/sec 0 sender [SUM] 0.00-10.00 sec 109 GBytes 94.0 Gbits/sec

EvgeniiMekhanik avatar Apr 03 '24 15:04 EvgeniiMekhanik

I see about 15% performance degradation (20.1Gbps vs 23.9Gbps for vanilla kernel) on my VM run as:

taskset --cpu-list 4,6,8,10 qemu-system-x86_64 -s -machine dump-guest-core=on -enable-kvm -m 8192 -cpu host -smp cpus=4 -no-hpet -name tfw-test -drive file=ubuntu.qcow2,if=virtio -netdev tap,ifname=tap0,id=n1,vhost=on,queues=4,script=no -device virtio-net,netdev=n1,mq=on,vectors=4,rx_queue_size=1024,tx_queue_size=1024 -boot order=cd,menu=on -serial file:serial-ubuntu.txt &

on performance cores of i9-12900HK.

I ran the benchmarks as iperf3 -c 192.168.100.4 -P 4 -A 2 -Z -V --bidir from the host system (core 2 is also performnace) and just iperf3 -s inside the VM. The network interrupts are distributed more or less equally inside the VM:

# grep virtio1-req /proc/interrupts
 25:       3968          0          0          0   PCI-MSI 65537-edge      virtio1-req.0
 26:          0       2803          0          0   PCI-MSI 65538-edge      virtio1-req.1
 27:          0          0       3618          0   PCI-MSI 65539-edge      virtio1-req.2
 28:          0          0          0       4160   PCI-MSI 65540-edge      virtio1-req.3

For some reason I can't set TCP window size of iperf3:

iperf3 -c 192.168.100.4 -P 4 -A 2 -Z -V --bidir -w 1024K
iperf 3.9
Linux tempest 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64
Control connection MSS 1448
Time: Wed, 03 Apr 2024 19:18:22 GMT
Connecting to host 192.168.100.4, port 5201
      Cookie: 4ceirfrewnbgqciozysf3tzubioiqja435n3
      TCP MSS: 1448 (default)
iperf3: error - socket buffer size not set correctly

__alloc_skb() with pg_skb_alloc() give 5.16% for Tempesta kernel versus __alloc_skb(), kmem_cache_alloc_node() and __kmalloc_node_track_caller() of 5.45% in total, so it seems we're good with the skb memory allocation.

We have higher percentage for rmqueue seems due to more aggressive work with pages in our skb allocator. We also have extra 1.95% for free_unref_page().

It could make sense also to explore perf profiles in memory accesses and/or cache misses - probably we also have higher memory footprint due to the skb allocator. But even with current data I'd say that we still can improve our page allocation mechanism.

Also #391 mentions per-cpu skb caches - I saw this technique in many research papers.

It's worth mentioning that the Tempesta kernel on it's own really introduces additional inevitable overheads like:

  1. FPU context store/restore on each softirq shot - the operations is pretty slow, but we do it to accelerate Tempesta FW string and crypto routines.

  2. larger MAX_HEADER and struc sk_buff giving larger memory footprint

  3. along with huge pages reservation on boot time, so the network stack can be under higher memory pressure

  4. and various small additional conditions like we have in __inet_hash_connect(), tcp_v4_syn_recv_sock(), tcp_skb_unclone() vs skb_unclone(), empty tcp_tfw_sk_prepare_xmit() and tcp_tfw_sk_write_xmit() callbacks calls etc.

The overheads helps us to get higher performance on the application layer, but we can and should do some work on better skb memory allocation and caching.

At the below are some logs from my analysis. I have still at least these open questions on the measurements:

  • Why Tempesta kernel exposes high skb_release_data() time?
  • Why vanilla kernel has high time for __skb_datagram_iter(), but Tempesta doesn't?

Vanilla kernel benchmark:

$ iperf3 -c 192.168.100.4 -P 4 -A 2 -Z -V --bidir
iperf 3.9
Linux tempest 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64
Control connection MSS 1448
Time: Wed, 03 Apr 2024 18:31:25 GMT
Connecting to host 192.168.100.4, port 5201
      Cookie: dkiltk3lknnsbsgokz2vklswwih2ulxztoia
      TCP MSS: 1448 (default)
[  5] local 192.168.100.1 port 44332 connected to 192.168.100.4 port 5201
[  7] local 192.168.100.1 port 44334 connected to 192.168.100.4 port 5201
[  9] local 192.168.100.1 port 44338 connected to 192.168.100.4 port 5201
[ 11] local 192.168.100.1 port 44354 connected to 192.168.100.4 port 5201
[ 13] local 192.168.100.1 port 44366 connected to 192.168.100.4 port 5201
[ 15] local 192.168.100.1 port 44380 connected to 192.168.100.4 port 5201
[ 17] local 192.168.100.1 port 44382 connected to 192.168.100.4 port 5201
[ 19] local 192.168.100.1 port 44390 connected to 192.168.100.4 port 5201
Starting Test: protocol: TCP, 4 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec  1.03 GBytes  8.85 Gbits/sec  4193    530 KBytes       
[  7][TX-C]   0.00-1.00   sec   418 MBytes  3.51 Gbits/sec  3944    600 KBytes       
[  9][TX-C]   0.00-1.00   sec   894 MBytes  7.50 Gbits/sec  6144    863 KBytes       
[ 11][TX-C]   0.00-1.00   sec   235 MBytes  1.97 Gbits/sec  3375    969 KBytes       
[SUM][TX-C]   0.00-1.00   sec  2.54 GBytes  21.8 Gbits/sec  17656             
[ 13][RX-C]   0.00-1.00   sec   774 MBytes  6.50 Gbits/sec                  
[ 15][RX-C]   0.00-1.00   sec   768 MBytes  6.44 Gbits/sec                  
[ 17][RX-C]   0.00-1.00   sec   770 MBytes  6.46 Gbits/sec                  
[ 19][RX-C]   0.00-1.00   sec   855 MBytes  7.17 Gbits/sec                  
[SUM][RX-C]   0.00-1.00   sec  3.09 GBytes  26.6 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   1.00-2.00   sec   398 MBytes  3.33 Gbits/sec  1395    550 KBytes       
[  7][TX-C]   1.00-2.00   sec   306 MBytes  2.57 Gbits/sec  2407    264 KBytes       
[  9][TX-C]   1.00-2.00   sec   614 MBytes  5.15 Gbits/sec  2410    549 KBytes       
[ 11][TX-C]   1.00-2.00   sec   102 MBytes   860 Mbits/sec  533    281 KBytes       
[SUM][TX-C]   1.00-2.00   sec  1.39 GBytes  11.9 Gbits/sec  6745             
[ 13][RX-C]   1.00-2.00   sec   713 MBytes  5.98 Gbits/sec                  
[ 15][RX-C]   1.00-2.00   sec   713 MBytes  5.98 Gbits/sec                  
[ 17][RX-C]   1.00-2.00   sec   713 MBytes  5.98 Gbits/sec                  
[ 19][RX-C]   1.00-2.00   sec   713 MBytes  5.98 Gbits/sec                  
[SUM][RX-C]   1.00-2.00   sec  2.78 GBytes  23.9 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   2.00-3.00   sec   479 MBytes  4.02 Gbits/sec  2414    516 KBytes       
[  7][TX-C]   2.00-3.00   sec   244 MBytes  2.04 Gbits/sec  2187    376 KBytes       
[  9][TX-C]   2.00-3.00   sec   435 MBytes  3.65 Gbits/sec  2390    386 KBytes       
[ 11][TX-C]   2.00-3.00   sec   152 MBytes  1.28 Gbits/sec  1020    291 KBytes       
[SUM][TX-C]   2.00-3.00   sec  1.28 GBytes  11.0 Gbits/sec  8011             
[ 13][RX-C]   2.00-3.00   sec   703 MBytes  5.89 Gbits/sec                  
[ 15][RX-C]   2.00-3.00   sec   703 MBytes  5.90 Gbits/sec                  
[ 17][RX-C]   2.00-3.00   sec   703 MBytes  5.90 Gbits/sec                  
[ 19][RX-C]   2.00-3.00   sec   704 MBytes  5.90 Gbits/sec                  
[SUM][RX-C]   2.00-3.00   sec  2.75 GBytes  23.6 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   3.00-4.00   sec   770 MBytes  6.46 Gbits/sec  2351    462 KBytes       
[  7][TX-C]   3.00-4.00   sec   650 MBytes  5.45 Gbits/sec  2949    509 KBytes       
[  9][TX-C]   3.00-4.00   sec   532 MBytes  4.47 Gbits/sec  2741    373 KBytes       
[ 11][TX-C]   3.00-4.00   sec   549 MBytes  4.60 Gbits/sec  3112    458 KBytes       
[SUM][TX-C]   3.00-4.00   sec  2.44 GBytes  21.0 Gbits/sec  11153             
[ 13][RX-C]   3.00-4.00   sec   718 MBytes  6.02 Gbits/sec                  
[ 15][RX-C]   3.00-4.00   sec   719 MBytes  6.03 Gbits/sec                  
[ 17][RX-C]   3.00-4.00   sec   717 MBytes  6.02 Gbits/sec                  
[ 19][RX-C]   3.00-4.00   sec   720 MBytes  6.04 Gbits/sec                  
[SUM][RX-C]   3.00-4.00   sec  2.81 GBytes  24.1 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   4.00-5.00   sec   839 MBytes  7.04 Gbits/sec  1767    540 KBytes       
[  7][TX-C]   4.00-5.00   sec   651 MBytes  5.46 Gbits/sec  1718    279 KBytes       
[  9][TX-C]   4.00-5.00   sec   738 MBytes  6.19 Gbits/sec  2280    553 KBytes       
[ 11][TX-C]   4.00-5.00   sec   362 MBytes  3.04 Gbits/sec  1471    454 KBytes       
[SUM][TX-C]   4.00-5.00   sec  2.53 GBytes  21.7 Gbits/sec  7236             
[ 13][RX-C]   4.00-5.00   sec   766 MBytes  6.42 Gbits/sec                  
[ 15][RX-C]   4.00-5.00   sec   765 MBytes  6.42 Gbits/sec                  
[ 17][RX-C]   4.00-5.00   sec   765 MBytes  6.42 Gbits/sec                  
[ 19][RX-C]   4.00-5.00   sec   765 MBytes  6.42 Gbits/sec                  
[SUM][RX-C]   4.00-5.00   sec  2.99 GBytes  25.7 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   5.00-6.00   sec   781 MBytes  6.55 Gbits/sec  2154    431 KBytes       
[  7][TX-C]   5.00-6.00   sec   649 MBytes  5.44 Gbits/sec  2512    592 KBytes       
[  9][TX-C]   5.00-6.00   sec   601 MBytes  5.04 Gbits/sec  1428    567 KBytes       
[ 11][TX-C]   5.00-6.00   sec   570 MBytes  4.78 Gbits/sec  3340    509 KBytes       
[SUM][TX-C]   5.00-6.00   sec  2.54 GBytes  21.8 Gbits/sec  9434             
[ 13][RX-C]   5.00-6.00   sec   802 MBytes  6.73 Gbits/sec                  
[ 15][RX-C]   5.00-6.00   sec   766 MBytes  6.42 Gbits/sec                  
[ 17][RX-C]   5.00-6.00   sec   767 MBytes  6.43 Gbits/sec                  
[ 19][RX-C]   5.00-6.00   sec   767 MBytes  6.43 Gbits/sec                  
[SUM][RX-C]   5.00-6.00   sec  3.03 GBytes  26.0 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   6.00-7.00   sec   918 MBytes  7.70 Gbits/sec  2003    840 KBytes       
[  7][TX-C]   6.00-7.00   sec   650 MBytes  5.45 Gbits/sec  2328    376 KBytes       
[  9][TX-C]   6.00-7.00   sec   672 MBytes  5.64 Gbits/sec  2730    677 KBytes       
[ 11][TX-C]   6.00-7.00   sec   380 MBytes  3.19 Gbits/sec  1137   1.41 KBytes       
[SUM][TX-C]   6.00-7.00   sec  2.56 GBytes  22.0 Gbits/sec  8198             
[ 13][RX-C]   6.00-7.00   sec   753 MBytes  6.32 Gbits/sec                  
[ 15][RX-C]   6.00-7.00   sec   752 MBytes  6.31 Gbits/sec                  
[ 17][RX-C]   6.00-7.00   sec   753 MBytes  6.31 Gbits/sec                  
[ 19][RX-C]   6.00-7.00   sec   752 MBytes  6.31 Gbits/sec                  
[SUM][RX-C]   6.00-7.00   sec  2.94 GBytes  25.3 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   7.00-8.00   sec   452 MBytes  3.80 Gbits/sec  1445    437 KBytes       
[  7][TX-C]   7.00-8.00   sec   315 MBytes  2.64 Gbits/sec  441    520 KBytes       
[  9][TX-C]   7.00-8.00   sec   392 MBytes  3.29 Gbits/sec  1184    454 KBytes       
[ 11][TX-C]   7.00-8.00   sec   285 MBytes  2.39 Gbits/sec  908    298 KBytes       
[SUM][TX-C]   7.00-8.00   sec  1.41 GBytes  12.1 Gbits/sec  3978             
[ 13][RX-C]   7.00-8.00   sec   642 MBytes  5.38 Gbits/sec                  
[ 15][RX-C]   7.00-8.00   sec   639 MBytes  5.36 Gbits/sec                  
[ 17][RX-C]   7.00-8.00   sec   640 MBytes  5.37 Gbits/sec                  
[ 19][RX-C]   7.00-8.00   sec   638 MBytes  5.35 Gbits/sec                  
[SUM][RX-C]   7.00-8.00   sec  2.50 GBytes  21.5 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   8.00-9.00   sec   414 MBytes  3.47 Gbits/sec  838    625 KBytes       
[  7][TX-C]   8.00-9.00   sec   276 MBytes  2.32 Gbits/sec  617    264 KBytes       
[  9][TX-C]   8.00-9.00   sec   542 MBytes  4.55 Gbits/sec  3259    475 KBytes       
[ 11][TX-C]   8.00-9.00   sec   201 MBytes  1.69 Gbits/sec  412    403 KBytes       
[SUM][TX-C]   8.00-9.00   sec  1.40 GBytes  12.0 Gbits/sec  5126             
[ 13][RX-C]   8.00-9.00   sec   632 MBytes  5.30 Gbits/sec                  
[ 15][RX-C]   8.00-9.00   sec   630 MBytes  5.29 Gbits/sec                  
[ 17][RX-C]   8.00-9.00   sec   631 MBytes  5.29 Gbits/sec                  
[ 19][RX-C]   8.00-9.00   sec   630 MBytes  5.28 Gbits/sec                  
[SUM][RX-C]   8.00-9.00   sec  2.46 GBytes  21.2 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   9.00-10.00  sec   489 MBytes  4.10 Gbits/sec  869    441 KBytes       
[  7][TX-C]   9.00-10.00  sec   311 MBytes  2.61 Gbits/sec  810    492 KBytes       
[  9][TX-C]   9.00-10.00  sec   310 MBytes  2.60 Gbits/sec  2133    403 KBytes       
[ 11][TX-C]   9.00-10.00  sec   361 MBytes  3.03 Gbits/sec  826    520 KBytes       
[SUM][TX-C]   9.00-10.00  sec  1.44 GBytes  12.3 Gbits/sec  4638             
[ 13][RX-C]   9.00-10.00  sec   628 MBytes  5.27 Gbits/sec                  
[ 15][RX-C]   9.00-10.00  sec   625 MBytes  5.24 Gbits/sec                  
[ 17][RX-C]   9.00-10.00  sec   627 MBytes  5.26 Gbits/sec                  
[ 19][RX-C]   9.00-10.00  sec   625 MBytes  5.24 Gbits/sec                  
[SUM][RX-C]   9.00-10.00  sec  2.45 GBytes  21.0 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  6.44 GBytes  5.53 Gbits/sec  19429             sender
[  5][TX-C]   0.00-9.99   sec  6.44 GBytes  5.53 Gbits/sec                  receiver
[  7][TX-C]   0.00-10.00  sec  4.37 GBytes  3.75 Gbits/sec  19913             sender
[  7][TX-C]   0.00-9.99   sec  4.36 GBytes  3.75 Gbits/sec                  receiver
[  9][TX-C]   0.00-10.00  sec  5.60 GBytes  4.81 Gbits/sec  26699             sender
[  9][TX-C]   0.00-9.99   sec  5.59 GBytes  4.81 Gbits/sec                  receiver
[ 11][TX-C]   0.00-10.00  sec  3.12 GBytes  2.68 Gbits/sec  16134             sender
[ 11][TX-C]   0.00-9.99   sec  3.12 GBytes  2.68 Gbits/sec                  receiver
[SUM][TX-C]   0.00-10.00  sec  19.5 GBytes  16.8 Gbits/sec  82175             sender
[SUM][TX-C]   0.00-9.99   sec  19.5 GBytes  16.8 Gbits/sec                  receiver
CPU Utilization: local/sender 44.7% (1.4%u/43.3%s), remote/receiver 26.0% (0.3%u/25.7%s)
CPU Utilization: local/receiver 44.7% (1.4%u/43.3%s), remote/sender 26.0% (0.3%u/25.7%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic
[ 13][RX-C]   0.00-10.00  sec  6.97 GBytes  5.98 Gbits/sec    0             sender
[ 13][RX-C]   0.00-9.99   sec  6.96 GBytes  5.99 Gbits/sec                  receiver
[ 15][RX-C]   0.00-10.00  sec  6.92 GBytes  5.94 Gbits/sec    0             sender
[ 15][RX-C]   0.00-9.99   sec  6.92 GBytes  5.94 Gbits/sec                  receiver
[ 17][RX-C]   0.00-10.00  sec  6.92 GBytes  5.95 Gbits/sec    0             sender
[ 17][RX-C]   0.00-9.99   sec  6.92 GBytes  5.95 Gbits/sec                  receiver
[ 19][RX-C]   0.00-10.00  sec  7.00 GBytes  6.02 Gbits/sec    0             sender
[ 19][RX-C]   0.00-9.99   sec  7.00 GBytes  6.02 Gbits/sec                  receiver
[SUM][RX-C]   0.00-10.00  sec  27.8 GBytes  23.9 Gbits/sec    0             sender
[SUM][RX-C]   0.00-9.99   sec  27.8 GBytes  23.9 Gbits/sec                  receiver
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

iperf Done.

Tempesta kernel benchmark:

# About 30% CPU usage on client side, so not CPU bound
# Inside the 4 CPU VM it's about 85% idle, with 45% CPU consumption on iperf3,
# but on the host the VM consumes more than 200% CPU.

$ iperf3 -c 192.168.100.4 -P 4 -A 2 -Z -V --bidir
iperf 3.9
Linux tempest 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64
Control connection MSS 1448
Time: Wed, 03 Apr 2024 18:26:21 GMT
Connecting to host 192.168.100.4, port 5201
      Cookie: r4l6hsi42fcbsfq3amqp7vtv423qwkt6kgi4
      TCP MSS: 1448 (default)
[  5] local 192.168.100.1 port 51714 connected to 192.168.100.4 port 5201
[  7] local 192.168.100.1 port 51720 connected to 192.168.100.4 port 5201
[  9] local 192.168.100.1 port 51728 connected to 192.168.100.4 port 5201
[ 11] local 192.168.100.1 port 51742 connected to 192.168.100.4 port 5201
[ 13] local 192.168.100.1 port 51750 connected to 192.168.100.4 port 5201
[ 15] local 192.168.100.1 port 51764 connected to 192.168.100.4 port 5201
[ 17] local 192.168.100.1 port 51768 connected to 192.168.100.4 port 5201
[ 19] local 192.168.100.1 port 51784 connected to 192.168.100.4 port 5201
Starting Test: protocol: TCP, 4 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec   483 MBytes  4.05 Gbits/sec  2177    421 KBytes       
[  7][TX-C]   0.00-1.00   sec   427 MBytes  3.58 Gbits/sec  2268    641 KBytes       
[  9][TX-C]   0.00-1.00   sec   709 MBytes  5.95 Gbits/sec  3407    741 KBytes       
[ 11][TX-C]   0.00-1.00   sec   473 MBytes  3.97 Gbits/sec  2182    454 KBytes       
[SUM][TX-C]   0.00-1.00   sec  2.04 GBytes  17.5 Gbits/sec  10034             
[ 13][RX-C]   0.00-1.00   sec   832 MBytes  6.98 Gbits/sec                  
[ 15][RX-C]   0.00-1.00   sec   833 MBytes  6.98 Gbits/sec                  
[ 17][RX-C]   0.00-1.00   sec   842 MBytes  7.06 Gbits/sec                  
[ 19][RX-C]   0.00-1.00   sec   808 MBytes  6.78 Gbits/sec                  
[SUM][RX-C]   0.00-1.00   sec  3.24 GBytes  27.8 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   1.00-2.00   sec   285 MBytes  2.39 Gbits/sec    0    783 KBytes       
[  7][TX-C]   1.00-2.00   sec   369 MBytes  3.09 Gbits/sec   11    717 KBytes       
[  9][TX-C]   1.00-2.00   sec   426 MBytes  3.58 Gbits/sec    0   1.07 MBytes       
[ 11][TX-C]   1.00-2.00   sec   298 MBytes  2.50 Gbits/sec    0    806 KBytes       
[SUM][TX-C]   1.00-2.00   sec  1.35 GBytes  11.6 Gbits/sec   11             
[ 13][RX-C]   1.00-2.00   sec   591 MBytes  4.96 Gbits/sec                  
[ 15][RX-C]   1.00-2.00   sec   591 MBytes  4.96 Gbits/sec                  
[ 17][RX-C]   1.00-2.00   sec   591 MBytes  4.95 Gbits/sec                  
[ 19][RX-C]   1.00-2.00   sec   589 MBytes  4.94 Gbits/sec                  
[SUM][RX-C]   1.00-2.00   sec  2.31 GBytes  19.8 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   2.00-3.00   sec   325 MBytes  2.73 Gbits/sec   45    885 KBytes       
[  7][TX-C]   2.00-3.00   sec   392 MBytes  3.29 Gbits/sec    0   1.03 MBytes       
[  9][TX-C]   2.00-3.00   sec   381 MBytes  3.20 Gbits/sec  141    786 KBytes       
[ 11][TX-C]   2.00-3.00   sec   296 MBytes  2.49 Gbits/sec  144    646 KBytes       
[SUM][TX-C]   2.00-3.00   sec  1.36 GBytes  11.7 Gbits/sec  330             
[ 13][RX-C]   2.00-3.00   sec   578 MBytes  4.85 Gbits/sec                  
[ 15][RX-C]   2.00-3.00   sec   578 MBytes  4.85 Gbits/sec                  
[ 17][RX-C]   2.00-3.00   sec   577 MBytes  4.84 Gbits/sec                  
[ 19][RX-C]   2.00-3.00   sec   578 MBytes  4.85 Gbits/sec                  
[SUM][RX-C]   2.00-3.00   sec  2.26 GBytes  19.4 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   3.00-4.00   sec   364 MBytes  3.05 Gbits/sec   79    962 KBytes       
[  7][TX-C]   3.00-4.00   sec   412 MBytes  3.46 Gbits/sec  107    758 KBytes       
[  9][TX-C]   3.00-4.00   sec   335 MBytes  2.81 Gbits/sec   47    897 KBytes       
[ 11][TX-C]   3.00-4.00   sec   280 MBytes  2.35 Gbits/sec   13    786 KBytes       
[SUM][TX-C]   3.00-4.00   sec  1.36 GBytes  11.7 Gbits/sec  246             
[ 13][RX-C]   3.00-4.00   sec   587 MBytes  4.93 Gbits/sec                  
[ 15][RX-C]   3.00-4.00   sec   588 MBytes  4.93 Gbits/sec                  
[ 17][RX-C]   3.00-4.00   sec   587 MBytes  4.93 Gbits/sec                  
[ 19][RX-C]   3.00-4.00   sec   586 MBytes  4.92 Gbits/sec                  
[SUM][RX-C]   3.00-4.00   sec  2.29 GBytes  19.7 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   4.00-5.00   sec   449 MBytes  3.76 Gbits/sec   13    940 KBytes       
[  7][TX-C]   4.00-5.00   sec   276 MBytes  2.32 Gbits/sec  139    629 KBytes       
[  9][TX-C]   4.00-5.00   sec   352 MBytes  2.96 Gbits/sec   91    967 KBytes       
[ 11][TX-C]   4.00-5.00   sec   308 MBytes  2.58 Gbits/sec   40    877 KBytes       
[SUM][TX-C]   4.00-5.00   sec  1.35 GBytes  11.6 Gbits/sec  283             
[ 13][RX-C]   4.00-5.00   sec   569 MBytes  4.77 Gbits/sec                  
[ 15][RX-C]   4.00-5.00   sec   570 MBytes  4.78 Gbits/sec                  
[ 17][RX-C]   4.00-5.00   sec   570 MBytes  4.78 Gbits/sec                  
[ 19][RX-C]   4.00-5.00   sec   572 MBytes  4.79 Gbits/sec                  
[SUM][RX-C]   4.00-5.00   sec  2.23 GBytes  19.1 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   5.00-6.00   sec   361 MBytes  3.03 Gbits/sec   57    987 KBytes       
[  7][TX-C]   5.00-6.00   sec   268 MBytes  2.24 Gbits/sec   29    771 KBytes       
[  9][TX-C]   5.00-6.00   sec   369 MBytes  3.09 Gbits/sec   47   1004 KBytes       
[ 11][TX-C]   5.00-6.00   sec   320 MBytes  2.68 Gbits/sec   68    665 KBytes       
[SUM][TX-C]   5.00-6.00   sec  1.29 GBytes  11.1 Gbits/sec  201             
[ 13][RX-C]   5.00-6.00   sec   557 MBytes  4.67 Gbits/sec                  
[ 15][RX-C]   5.00-6.00   sec   558 MBytes  4.68 Gbits/sec                  
[ 17][RX-C]   5.00-6.00   sec   557 MBytes  4.68 Gbits/sec                  
[ 19][RX-C]   5.00-6.00   sec   556 MBytes  4.67 Gbits/sec                  
[SUM][RX-C]   5.00-6.00   sec  2.18 GBytes  18.7 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   6.00-7.00   sec   399 MBytes  3.34 Gbits/sec   29   1.01 MBytes       
[  7][TX-C]   6.00-7.00   sec   315 MBytes  2.64 Gbits/sec   10    619 KBytes       
[  9][TX-C]   6.00-7.00   sec   395 MBytes  3.31 Gbits/sec  167   1.02 MBytes       
[ 11][TX-C]   6.00-7.00   sec   270 MBytes  2.26 Gbits/sec   53    564 KBytes       
[SUM][TX-C]   6.00-7.00   sec  1.35 GBytes  11.6 Gbits/sec  259             
[ 13][RX-C]   6.00-7.00   sec   580 MBytes  4.87 Gbits/sec                  
[ 15][RX-C]   6.00-7.00   sec   580 MBytes  4.86 Gbits/sec                  
[ 17][RX-C]   6.00-7.00   sec   580 MBytes  4.87 Gbits/sec                  
[ 19][RX-C]   6.00-7.00   sec   578 MBytes  4.85 Gbits/sec                  
[SUM][RX-C]   6.00-7.00   sec  2.26 GBytes  19.5 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   7.00-8.00   sec   391 MBytes  3.28 Gbits/sec   47   1.03 MBytes       
[  7][TX-C]   7.00-8.00   sec   334 MBytes  2.80 Gbits/sec   45    676 KBytes       
[  9][TX-C]   7.00-8.00   sec   384 MBytes  3.22 Gbits/sec   71   1.03 MBytes       
[ 11][TX-C]   7.00-8.00   sec   279 MBytes  2.34 Gbits/sec   89    461 KBytes       
[SUM][TX-C]   7.00-8.00   sec  1.35 GBytes  11.6 Gbits/sec  252             
[ 13][RX-C]   7.00-8.00   sec   570 MBytes  4.78 Gbits/sec                  
[ 15][RX-C]   7.00-8.00   sec   571 MBytes  4.79 Gbits/sec                  
[ 17][RX-C]   7.00-8.00   sec   571 MBytes  4.79 Gbits/sec                  
[ 19][RX-C]   7.00-8.00   sec   569 MBytes  4.78 Gbits/sec                  
[SUM][RX-C]   7.00-8.00   sec  2.23 GBytes  19.1 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   8.00-9.00   sec   518 MBytes  4.34 Gbits/sec    0   1.35 MBytes       
[  7][TX-C]   8.00-9.00   sec   340 MBytes  2.85 Gbits/sec   35    718 KBytes       
[  9][TX-C]   8.00-9.00   sec   315 MBytes  2.64 Gbits/sec  153    561 KBytes       
[ 11][TX-C]   8.00-9.00   sec   221 MBytes  1.86 Gbits/sec   40    580 KBytes       
[SUM][TX-C]   8.00-9.00   sec  1.36 GBytes  11.7 Gbits/sec  228             
[ 13][RX-C]   8.00-9.00   sec   571 MBytes  4.79 Gbits/sec                  
[ 15][RX-C]   8.00-9.00   sec   572 MBytes  4.80 Gbits/sec                  
[ 17][RX-C]   8.00-9.00   sec   571 MBytes  4.79 Gbits/sec                  
[ 19][RX-C]   8.00-9.00   sec   571 MBytes  4.79 Gbits/sec                  
[SUM][RX-C]   8.00-9.00   sec  2.23 GBytes  19.2 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5][TX-C]   9.00-10.00  sec   476 MBytes  4.00 Gbits/sec   68    940 KBytes       
[  7][TX-C]   9.00-10.00  sec   361 MBytes  3.03 Gbits/sec    1    773 KBytes       
[  9][TX-C]   9.00-10.00  sec   298 MBytes  2.50 Gbits/sec   53    662 KBytes       
[ 11][TX-C]   9.00-10.00  sec   259 MBytes  2.17 Gbits/sec   45    742 KBytes       
[SUM][TX-C]   9.00-10.00  sec  1.36 GBytes  11.7 Gbits/sec  167             
[ 13][RX-C]   9.00-10.00  sec   577 MBytes  4.84 Gbits/sec                  
[ 15][RX-C]   9.00-10.00  sec   577 MBytes  4.84 Gbits/sec                  
[ 17][RX-C]   9.00-10.00  sec   576 MBytes  4.84 Gbits/sec                  
[ 19][RX-C]   9.00-10.00  sec   577 MBytes  4.84 Gbits/sec                  
[SUM][RX-C]   9.00-10.00  sec  2.25 GBytes  19.4 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  3.96 GBytes  3.40 Gbits/sec  2515             sender
[  5][TX-C]   0.00-10.03  sec  3.95 GBytes  3.39 Gbits/sec                  receiver
[  7][TX-C]   0.00-10.00  sec  3.41 GBytes  2.93 Gbits/sec  2645             sender
[  7][TX-C]   0.00-10.03  sec  3.41 GBytes  2.92 Gbits/sec                  receiver
[  9][TX-C]   0.00-10.00  sec  3.87 GBytes  3.33 Gbits/sec  4177             sender
[  9][TX-C]   0.00-10.03  sec  3.87 GBytes  3.31 Gbits/sec                  receiver
[ 11][TX-C]   0.00-10.00  sec  2.93 GBytes  2.52 Gbits/sec  2674             sender
[ 11][TX-C]   0.00-10.03  sec  2.93 GBytes  2.51 Gbits/sec                  receiver
[SUM][TX-C]   0.00-10.00  sec  14.2 GBytes  12.2 Gbits/sec  12011             sender
[SUM][TX-C]   0.00-10.03  sec  14.2 GBytes  12.1 Gbits/sec                  receiver
CPU Utilization: local/sender 32.4% (1.4%u/31.0%s), remote/receiver 26.1% (0.6%u/25.5%s)
CPU Utilization: local/receiver 32.4% (1.4%u/31.0%s), remote/sender 26.1% (0.6%u/25.5%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic
[ 13][RX-C]   0.00-10.00  sec  5.87 GBytes  5.05 Gbits/sec    0             sender
[ 13][RX-C]   0.00-10.03  sec  5.87 GBytes  5.03 Gbits/sec                  receiver
[ 15][RX-C]   0.00-10.00  sec  5.88 GBytes  5.05 Gbits/sec    0             sender
[ 15][RX-C]   0.00-10.03  sec  5.88 GBytes  5.03 Gbits/sec                  receiver
[ 17][RX-C]   0.00-10.00  sec  5.88 GBytes  5.05 Gbits/sec    0             sender
[ 17][RX-C]   0.00-10.03  sec  5.88 GBytes  5.04 Gbits/sec                  receiver
[ 19][RX-C]   0.00-10.00  sec  5.85 GBytes  5.02 Gbits/sec    0             sender
[ 19][RX-C]   0.00-10.03  sec  5.84 GBytes  5.01 Gbits/sec                  receiver
[SUM][RX-C]   0.00-10.00  sec  23.5 GBytes  20.2 Gbits/sec    0             sender
[SUM][RX-C]   0.00-10.03  sec  23.5 GBytes  20.1 Gbits/sec                  receiver
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

iperf Done.

Vanilla kernel perf top:

    27.16%  [kernel]       [k] copy_user_enhanced_fast_string
     5.61%  [kernel]       [k] __skb_datagram_iter              <--- why we have this smaller?
     5.42%  [kernel]       [k] copy_user_generic_unrolled
     5.06%  [kernel]       [k] __slab_free
     2.52%  [kernel]       [k] __check_object_size.part.0
     2.47%  [kernel]       [k] tcp_sendmsg_locked
     2.34%  [kernel]       [k] __kmalloc_node_track_caller
     2.22%  [virtio_net]   [k] page_to_skb
     2.02%  [kernel]       [k] _copy_to_iter
     1.88%  [kernel]       [k] skb_release_data
     1.81%  [kernel]       [k] kmem_cache_alloc_node
     1.48%  [kernel]       [k] _raw_spin_unlock_irqrestore
     1.44%  [kernel]       [k] __free_pages_ok
     1.42%  [virtio_net]   [k] try_fill_recv
     1.38%  [kernel]       [k] __do_softirq
     1.38%  [virtio_ring]  [k] detach_buf_split
     1.30%  [kernel]       [k] __alloc_skb
     1.23%  [kernel]       [k] put_cpu_partial
     1.20%  [kernel]       [k] __virt_addr_valid
     1.06%  [kernel]       [k] rmqueue
     1.04%  [virtio_ring]  [k] virtqueue_get_buf_ctx_split
     0.85%  [virtio_net]   [k] receive_mergeable
     0.76%  [virtio_ring]  [k] virtqueue_add_split
     0.75%  [kernel]       [k] kfree
     0.75%  [kernel]       [k] gro_list_prepare
     0.72%  [kernel]       [k] dev_gro_receive
     0.70%  [kernel]       [k] finish_task_switch
     0.70%  [kernel]       [k] read_tsc

Tempesta kernel perf top:

    26.30%  [kernel]       [k] copy_user_enhanced_fast_string
     6.84%  [kernel]       [k] copy_user_generic_unrolled
     3.45%  [kernel]       [k] __check_object_size.part.0
     3.05%  [kernel]       [k] __alloc_skb
     2.75%  [kernel]       [k] rmqueue
     2.67%  [kernel]       [k] tcp_sendmsg_locked
     2.50%  [kernel]       [k] skb_release_data                   <--?? (1.88% in vanilla)
     2.36%  [kernel]       [k] __skb_datagram_iter
     2.28%  [kernel]       [k] _raw_spin_unlock_irqrestore
     2.12%  [virtio_net]   [k] page_to_skb
     2.11%  [kernel]       [k] pg_skb_alloc
     1.95%  [kernel]       [k] free_unref_page                   <--- (vanilla doesn't have this)
     1.93%  [kernel]       [k] skb_gro_receive
     1.86%  [kernel]       [k] _copy_to_iter
     1.55%  [kernel]       [k] __free_pages_ok
     1.54%  [virtio_net]   [k] try_fill_recv
     1.46%  [kernel]       [k] __virt_addr_valid
     1.46%  [virtio_ring]  [k] detach_buf_split
     1.15%  [kernel]       [k] __do_softirq
     0.91%  [kernel]       [k] finish_task_switch
     0.87%  [virtio_ring]  [k] virtqueue_get_buf_ctx_split
     0.85%  [kernel]       [k] memcpy_erms
     0.84%  [virtio_net]   [k] receive_mergeable
     0.81%  [kernel]       [k] read_tsc
     0.79%  [kernel]       [k] dev_gro_receive
     0.76%  [kernel]       [k] tcp_gro_receive
     0.74%  [kernel]       [k] tcp_poll
     0.72%  [kernel]       [k] napi_skb_free_stolen_head

krizhanovsky avatar Apr 03 '24 20:04 krizhanovsky