Bad scaling due to flooding and overhead of copying packets in limactl
Throughput decreases and cpu usage increases significantly when adding more vms connected to the same socket_vmnet daemon.
Tested using:
- host: running
iperf3 -c ... - server vm: running
iper3 -s - 1-4 additional idle vms
| vms | bitrate (Gbits/sec) | cpu (%) |
|---|---|---|
| 1 | 3.52 | 51.23 |
| 2 | 2.42 | 58.17 |
| 3 | 1.22 | 81.28 |
| 4 | 0.81 | 93.07 |
Expected behavior
- Performance and cpu usage should remain the same when adding more idle vms
- Packets sent to one vm should not be forwarded to other vms
- Packets should be copied directly to vz datagram socket in socket_vmnet, bypassing limactl
Why it happens
When we have multiple vms connected to socket_vmnet:
- every packet sent from the vmnet interface is forwarded to every vm, instead of the vm with the right mac address.
- every packet sent from any vm is forwarded to all other vms, and vmnet inteterface, instead of one of the vm or only vmnet interface
- when a packet is forwarded to a vm, it is copied to the vz datagram socket via a socket pair in limactl
- packets forwarded from limactl to the vz are copied and processed in the guest, where they are dropped (since the packets are not related to the guest).
Flow when receiving a packet from vmnet with 4 vms
host iperf3 ->
host kernel ->
vmnet ->
socket_vmnet ->
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel ->
guest iperf3
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
Flow when receiving a packet from a vm
guest iperf3 ->
guest kernel ->
vz ->
host kernel ->
limactl ->
host kernel ->
socket_vmnet ->
vmnet ->
host_kernel ->
host iperf3
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
host kernel ->
limactl ->
host kernel ->
vz ->
guest kernel (drop)
CPU usage for all vms processes
Looking at cpu usage of socket_vmnet, vm service processes, and limactl processes, we see that there is extreme cpu usage related with processing partly or completely unrelated packets:
| command | %cpu | related |
|---|---|---|
| com.apple.Virtua | 136.9 | yes |
| limactl | 121.4 | yes |
| iperf3-darwin | 13.7 | yes |
| socket_vmnet | 106.6 | partly |
| kernel_task | 39.1 | partly |
| com.apple.Virtua | 83.5 | no |
| com.apple.Virtua | 81.0 | no |
| com.apple.Virtua | 77.4 | no |
| limactl | 67.1 | no |
| limactl | 65.6 | no |
| limactl | 62.9 | no |
Total cpu usage:
| work | %cpu |
|---|---|
| related | 272.0 |
| partly | 145.7 |
| unrelated | 437.5 |
Tested on M1 Pro (8 performance cores, 2 efficiency cores)
Full results
1 vm
% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[ 5] local 192.168.105.1 port 60990 connected to 192.168.105.58 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd RTT
[ 5] 0.00-1.00 sec 460 MBytes 3.86 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 1.00-2.00 sec 421 MBytes 3.53 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 2.00-3.00 sec 435 MBytes 3.65 Gbits/sec 0 8.00 MBytes 10ms
[ 5] 3.00-4.00 sec 411 MBytes 3.45 Gbits/sec 0 8.00 MBytes 14ms
[ 5] 4.00-5.00 sec 317 MBytes 2.66 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 5.00-6.00 sec 430 MBytes 3.61 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 6.00-7.00 sec 423 MBytes 3.55 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 7.00-8.00 sec 433 MBytes 3.63 Gbits/sec 0 8.00 MBytes 10ms
[ 5] 8.00-9.00 sec 437 MBytes 3.67 Gbits/sec 0 8.00 MBytes 9ms
[ 5] 9.00-10.00 sec 430 MBytes 3.61 Gbits/sec 0 8.00 MBytes 9ms
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.10 GBytes 3.52 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 4.10 GBytes 3.52 Gbits/sec receiver
cpu usage
CPU usage: 20.3% user, 31.19% sys, 48.77% idle
PID COMMAND %CPU #TH
49183 com.apple.Virtua 166.3 19/3
49173 limactl 100.0 16/2
48954 socket_vmnet 64.4 5/1
0 kernel_task 57.8 561/10
54694 iperf3-darwin 18.6 1/1
2 vms
% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[ 5] local 192.168.105.1 port 60997 connected to 192.168.105.58 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd RTT
[ 5] 0.00-1.00 sec 269 MBytes 2.26 Gbits/sec 0 8.00 MBytes 13ms
[ 5] 1.00-2.00 sec 299 MBytes 2.51 Gbits/sec 0 8.00 MBytes 14ms
[ 5] 2.00-3.00 sec 263 MBytes 2.21 Gbits/sec 0 8.00 MBytes 15ms
[ 5] 3.00-4.00 sec 296 MBytes 2.48 Gbits/sec 0 8.00 MBytes 13ms
[ 5] 4.00-5.00 sec 298 MBytes 2.50 Gbits/sec 0 8.00 MBytes 12ms
[ 5] 5.00-6.00 sec 284 MBytes 2.38 Gbits/sec 0 8.00 MBytes 13ms
[ 5] 6.00-7.00 sec 299 MBytes 2.51 Gbits/sec 0 8.00 MBytes 14ms
[ 5] 7.00-8.00 sec 298 MBytes 2.50 Gbits/sec 0 8.00 MBytes 14ms
[ 5] 8.00-9.00 sec 285 MBytes 2.39 Gbits/sec 0 8.00 MBytes 13ms
[ 5] 9.00-10.00 sec 298 MBytes 2.50 Gbits/sec 0 8.00 MBytes 12ms
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.82 GBytes 2.42 Gbits/sec 0 sender
[ 5] 0.00-10.01 sec 2.82 GBytes 2.42 Gbits/sec receiver
cpu usage
CPU usage: 20.84% user, 37.32% sys, 41.83% idle
PID COMMAND %CPU #TH
49183 com.apple.Virtua 132.9 18/2
49173 limactl 92.2 16/3
48954 socket_vmnet 77.0 6/1
49905 com.apple.Virtua 74.2 18/1
49900 limactl 57.3 16/1
0 kernel_task 41.4 561/12
54259 iperf3-darwin 22.1 1/1
3 vms
% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[ 5] local 192.168.105.1 port 61004 connected to 192.168.105.58 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd RTT
[ 5] 0.00-1.00 sec 161 MBytes 1.35 Gbits/sec 0 2.91 MBytes 21ms
[ 5] 1.00-2.00 sec 138 MBytes 1.16 Gbits/sec 0 3.05 MBytes 17ms
[ 5] 2.00-3.00 sec 143 MBytes 1.20 Gbits/sec 0 3.15 MBytes 44ms
[ 5] 3.00-4.00 sec 139 MBytes 1.17 Gbits/sec 0 3.24 MBytes 19ms
[ 5] 4.00-5.00 sec 138 MBytes 1.16 Gbits/sec 0 3.30 MBytes 25ms
[ 5] 5.00-6.00 sec 144 MBytes 1.21 Gbits/sec 0 3.34 MBytes 22ms
[ 5] 6.00-7.00 sec 154 MBytes 1.29 Gbits/sec 0 3.37 MBytes 23ms
[ 5] 7.00-8.00 sec 145 MBytes 1.21 Gbits/sec 0 3.38 MBytes 15ms
[ 5] 8.00-9.00 sec 142 MBytes 1.19 Gbits/sec 0 3.39 MBytes 17ms
[ 5] 9.00-10.00 sec 154 MBytes 1.29 Gbits/sec 0 3.39 MBytes 23ms
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.42 GBytes 1.22 Gbits/sec 0 sender
[ 5] 0.00-10.01 sec 1.42 GBytes 1.22 Gbits/sec receiver
cpu usage
CPU usage: 24.13% user, 57.13% sys, 18.72% idle
PID COMMAND %CPU #TH
49183 com.apple.Virtua 145.8 18/2
49173 limactl 120.5 15/2
48954 socket_vmnet 99.8 7/2
49905 com.apple.Virtua 82.9 18/1
50380 com.apple.Virtua 82.1 18/1
50375 limactl 63.4 16/1
49900 limactl 61.7 16/1
0 kernel_task 43.4 561/11
53677 iperf3-darwin 15.2 1/1
4 vms
% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[ 5] local 192.168.105.1 port 61014 connected to 192.168.105.58 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd RTT
[ 5] 0.00-1.00 sec 99.8 MBytes 837 Mbits/sec 0 2.90 MBytes 26ms
[ 5] 1.00-2.00 sec 98.3 MBytes 824 Mbits/sec 0 2.53 MBytes 25ms
[ 5] 2.00-3.00 sec 98.2 MBytes 823 Mbits/sec 0 3.03 MBytes 69ms
[ 5] 3.00-4.00 sec 99.7 MBytes 836 Mbits/sec 0 3.04 MBytes 30ms
[ 5] 4.00-5.00 sec 103 MBytes 860 Mbits/sec 0 3.03 MBytes 22ms
[ 5] 5.00-6.00 sec 91.2 MBytes 765 Mbits/sec 0 3.03 MBytes 27ms
[ 5] 6.00-7.00 sec 100 MBytes 842 Mbits/sec 0 3.03 MBytes 61ms
[ 5] 7.00-8.00 sec 102 MBytes 858 Mbits/sec 0 3.04 MBytes 33ms
[ 5] 8.00-9.00 sec 98.2 MBytes 823 Mbits/sec 0 3.04 MBytes 31ms
[ 5] 9.00-10.00 sec 103 MBytes 862 Mbits/sec 0 3.04 MBytes 28ms
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 993 MBytes 833 Mbits/sec 0 sender
[ 5] 0.00-10.02 sec 991 MBytes 830 Mbits/sec receiver
cpu usage
CPU usage: 25.28% user, 67.77% sys, 6.93% idle
PID COMMAND %CPU #TH
49183 com.apple.Virtua 136.9 18/2
49173 limactl 121.4 15/2
48954 socket_vmnet 106.6 8/1
50380 com.apple.Virtua 83.5 18/2
50731 com.apple.Virtua 81.0 18/1
49905 com.apple.Virtua 77.4 18/2
50375 limactl 67.1 16/1
50726 limactl 65.6 16/1
49900 limactl 62.9 16/1
0 kernel_task 39.1 561/10
53126 iperf3-darwin 13.7 1
Yes, this is a long-standing TODO https://github.com/lima-vm/socket_vmnet/blob/0b6aed916e194309bfc3f1245003a5fdc3438848/main.c#L531-L562
@nirs can you point to the code where the copy in limactl occurs? I don't understand why there are so many copies.
The pipeline
Lima:
kernel <-vmnet-> socket_vment <-unixstream-> lima <-unixgram-> vz service <-virtio-> guest
QEMU:
kernel <-vmnet-> socket_vment <-unixstream-> qemu <-virtio-> guest
Receiving a packet from a vm
This happens in the thread forwarding packets from client socket fd: https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L467
For each packet we read: https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L492
We send the packet to vmnet interface (copy 1): https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L518
and all other sockets (N-1 copies): https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L548
Receiving packet from vmnet
This happens int the vmnet handler block, called when some packets are ready on the vmnet interface: https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L283
We read multiple packets (up to 32 packets per call): https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L152
For each packet we iterate over all connection and write the packet to the connection (N copies): https://github.com/lima-vm/socket_vmnet/blob/f486d475d4842bbddfe8f66ba09f7d1cb10cfbed/main.c#L191
Additional copies in lima
Each packet read from VZ is copied to the socket_vment socket via a socketpair: https://github.com/lima-vm/lima/blob/1f0113c2b0ecd5b21a5c84f60cb83a09ffab0dee/pkg/vz/network_darwin.go#L68
Each packet read from socket_vmnet is copied to VZ via a socketpair: https://github.com/lima-vm/lima/blob/1f0113c2b0ecd5b21a5c84f60cb83a09ffab0dee/pkg/vz/network_darwin.go#L75
This is done for every VM using lima:shared, lima:bridged, or socket - regardless of the actual packet destination.