vitastor
vitastor copied to clipboard
[performance] Performance degradation over vitastor 32k
Hi @vitalif , I tested the performance of vitastor without RDMA.
Below 32k (4k, 8k, 16k performance is higher than ceph, latency is lower than ceph) Above 32k, including 32k (32k 64k 128k 256k 512k 1m performance is lower than ceph, latency is higher than ceph) The above rand and sequence have been tested, and the results are the same.
Does it have a special meaning for 32k? Can I adjust some parameters to improve the performance above 32k?
I don't think so... There is a special meaning for 128k - it's the block (object) size. I.e. writes smaller than 128k are first written to the journal (so they result in additional WA) and writes starting with 128k are written using redirect-write. 32k-1M performance lower than ceph is strange, can you provide any details?
I have 3 physical machines, each with 1 NVME, and 3 machines form an etcd cluster. The vitastor pool uses 3 copies (16 PGs). Use libvirt to create a ubuntu18.08 virtual machine through qemu, create two inodes in vitastor(each 100GB), one as the system disk and the other as the fio performance test disk. Use fio in the virtual machine to test the performance test disk.
OSD parameter
root@vita-3:~# systemctl cat vitastor-osd3.service
# /etc/systemd/system/vitastor-osd3.service
[Unit]
Description=Vitastor object storage daemon osd.3
After=network-online.target local-fs.target time-sync.target
Wants=network-online.target local-fs.target time-sync.target
PartOf=vitastor.target
[Service]
LimitNOFILE=1048576
LimitNPROC=1048576
LimitMEMLOCK=infinity
ExecStart=/usr/bin/vitastor-osd \
--etcd_address 192.168.1.217:2379/v3 \
--bind_address 192.168.1.217 \
--osd_num 3 \
--disable_data_fsync 1 \
--flusher_count 8 \
--immediate_commit all \
--disk_alignment 4096 --journal_block_size 4096 --meta_block_size 4096 \
--journal_no_same_sector_overwrites true \
--journal_sector_buffer_count 1024 \
--data_device /dev/disk/by-partuuid/2e2c938b-5f88-4354-8db6-7bac132b5b67 --journal_offset 0 --meta_offset 16777216 --data_offset 798183424
WorkingDirectory=/
ExecStartPre=+chown vitastor:vitastor /dev/disk/by-partuuid/2e2c938b-5f88-4354-8db6-7bac132b5b67
User=vitastor
PrivateTmp=false
TasksMax=infinity
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=vitastor.target
fio parameter
root@yujiang:~# cat run_fio_multhread_iscsi.sh
#!/bin/bash
set -x
#SHELL_FOLDER=$(dirname "$0")
#source ${SHELL_FOLDER}/common_variable_iscsi
IOENGINE="libaio"
FILE_NAME="/dev/vdb"
RW=(randwrite randread)
BS=(4k 8k 16k 32k 64k 128k 256k 512k 1m)
NUMJOBS=16
DIRECT=1
SIZE="100g"
IODEPTH=4
RUNTIME=300
for rw_name in ${RW[@]}
do
for bs_size in ${BS[@]}
do
fio -ioengine=${IOENGINE} -numjobs=${NUMJOBS} -direct=${DIRECT} -size=${SIZE} -iodepth=${IODEPTH} -runtime=${RUNTIME} -rw=${rw_name} -ba=${bs_size} -bs=${bs_size} -filename=${FILE_NAME} -name="iscsi_${IOENGINE}_${IODEPTH}_${rw_name}_${bs_size}" -group_reporting > iscsi_${IOENGINE}_${IODEPTH}_${rw_name}_${bs_size}.log
sleep 60
done
done
ceph
Virtual Disk Size | I/O Block Size | I/O Mode | IOPS | BW | Latency |
---|---|---|---|---|---|
100GB | 4K | randwrite | 19.7k | 80.7MB/s | 3243.62 (usec) |
100GB | 4K | randread | 40.7k | 167MB/s | 1566.41 (usec) |
100GB | 8K | randwrite | 17.6k | 145MB/s | 3623.02 (usec) |
100GB | 8K | randread | 39.8k | 326MB/s | 1602.89 (usec) |
100GB | 16K | randwrite | 16.2k | 266MB/s | 3940.96 (usec) |
100GB | 16K | randread | 33.4k | 548MB/s | 1910.04 (usec) |
100GB | 32K | randwrite | 14.4k | 471MB/s | 4451.7 (usec) |
100GB | 32K | randread | 28.1k | 919MB/s | 2276.41 (usec) |
100GB | 64K | randwrite | 12.6k | 825MB/s | 5076.83 (usec) |
100GB | 64K | randread | 16.0k | 1049MB/s | 3993.57 (usec) |
100GB | 128K | randwrite | 6919 | 907MB/s | 9243.13 (usec) |
100GB | 128K | randread | 8002 | 1049MB/s | 7991.51 (usec) |
100GB | 256K | randwrite | 3521 | 923MB/s | 18.17 (msec) |
100GB | 256K | randread | 4001 | 1049MB/s | 15987.12 (usec) |
100GB | 512K | randwrite | 1741 | 913MB/s | 36.73 (msec) |
100GB | 512K | randread | 2000 | 1049MB/s | 31979.61 (usec) |
100GB | 1M | randwrite | 871 | 914MB/s | 73.43 (msec) |
100GB | 1M | randread | 1000 | 1049MB/s | 63964.36 (usec) |
vitastor
Virtual Disk Size | I/O Block Size | I/O Mode | IOPS | BW | Latency |
---|---|---|---|---|---|
100GB | 4K | randwrite | 30.2k | 124MB/s | 2113.36 (usec) |
100GB | 4K | randread | 46.2k | 189MB/s | 1379.11 (usec) |
100GB | 8K | randwrite | 27.1k | 222MB/s | 2357.34 (usec) |
100GB | 8K | randread | 44.4k | 364MB/s | 1435.87 (usec) |
100GB | 16K | randwrite | 20.6k | 337MB/s | 3103.98 (usec) |
100GB | 16K | randread | 35.9k | 589MB/s | 1776.66 (usec) |
100GB | 32K | randwrite | 10.9k | 358MB/s | 5851.9 (usec) |
100GB | 32K | randread | 24.3k | 797MB/s | 2621.98 (usec) |
100GB | 64K | randwrite | 6020 | 395MB/s | 10618.76 (usec) |
100GB | 64K | randread | 14.3k | 937MB/s | 4466.6 (usec) |
100GB | 128K | randwrite | 4971 | 652MB/s | 12861.98 (usec) |
100GB | 128K | randread | 7984 | 1047MB/s | 8005.8 (usec) |
100GB | 256K | randwrite | 2477 | 649MB/s | 25819.94 (usec) |
100GB | 256K | randread | 3556 | 932MB/s | 17988.72 (usec) |
100GB | 512K | randwrite | 1223 | 641MB/s | 52.3 (msec) |
100GB | 512K | randread | 1733 | 909MB/s | 36920.56 (usec) |
100GB | 1M | randwrite | 622 | 653MB/s | 102.72 (msec) |
100GB | 1M | randread | 927 | 973MB/s | 68.97 (msec) |
First of all I think you should remove --flusher_count option, in new versions it doesn't require manual tuning, because it's auto-tuned between 1 and max_flusher_count, and --flusher_count actually sets max_flusher_count So can you remove the option and retest?
First of all I think you should remove --flusher_count option, in new versions it doesn't require manual tuning, because it's auto-tuned between 1 and max_flusher_count, and --flusher_count actually sets max_flusher_count So can you remove the option and retest?
Thank you, this result is actually the result of not adding the flusher_count
parameter. It has not been tested after adding this parameter.
The following are the test results of T1Q1
Virtual Disk Size | I/O Block Size | I/O Mode | IOPS | BW | Latency |
---|---|---|---|---|---|
100GB | 4K | randwrite | 3012 | 12.3MB/s | 327.57 (usec) |
100GB | 4K | randread | 2997 | 12.3MB/s | 329.13 (usec) |
100GB | 8K | randwrite | 2189 | 17.9MB/s | 451.76 (usec) |
100GB | 8K | randread | 2135 | 17.5MB/s | 463.5 (usec) |
100GB | 16K | randwrite | 1859 | 30.5MB/s | 532.83 (usec) |
100GB | 16K | randread | 1681 | 27.6MB/s | 589.69 (usec) |
100GB | 32K | randwrite | 1499 | 49.1MB/s | 661.65 (usec) |
100GB | 32K | randread | 1491 | 48.9MB/s | 665.35 (usec) |
100GB | 64K | randwrite | 1195 | 78.3MB/s | 831.32 (usec) |
100GB | 64K | randread | 1398 | 91.6MB/s | 710.29 (usec) |
100GB | 128K | randwrite | 1204 | 158MB/s | 824.82 (usec) |
100GB | 128K | randread | 1144 | 150MB/s | 868.99 (usec) |
100GB | 256K | randwrite | 1017 | 267MB/s | 977.98 (usec) |
100GB | 256K | randread | 1070 | 281MB/s | 928.77 (usec) |
100GB | 512K | randwrite | 747 | 392MB/s | 1333.52 (usec) |
100GB | 512K | randread | 973 | 510MB/s | 1022.11 (usec) |
100GB | 1M | randwrite | 484 | 509MB/s | 2056.96 (usec) |
100GB | 1M | randread | 774 | 812MB/s | 1285.82 (usec) |
In the README, your latency can reach 0.14ms, how can I achieve your effect?
T1Q1 write: 7087 iops (0.14ms latency)
T1Q1 read: 6838 iops (0.145ms latency)
Is 0.32ms-0.14ms related to RDMA?
Best latency is achieved when you disable powersaving, cpupower idle-set -D 0 && cpupower frequency-set --governor performance. You can also pin each vitastor osd to 1 core and disable powersaving only for these cores
If it doesn't help then your network may be slow
Regarding writes, I still think your performance is too low. What if you run 1 fio job with rw=write bs=4M iodepth=4?
0.14 was achieved without rdma
After the CPU performance is turned on, the effect is not obvious.
Virtual Disk Size | I/O Block Size | I/O Mode | IOPS | BW | Latency |
---|---|---|---|---|---|
100GB | 4K | randwrite | 3317 | 13.6MB/s | 297.46 (usec) |
100GB | 4K | randread | 3143 | 12.9MB/s | 313.77 (usec) |
100GB | 8K | randwrite | 2207 | 18.1MB/s | 447.97 (usec) |
100GB | 8K | randread | 2268 | 18.6MB/s | 435.96 (usec) |
100GB | 16K | randwrite | 1894 | 31.0MB/s | 522.65 (usec) |
100GB | 16K | randread | 1755 | 28.8MB/s | 564.65 (usec) |
100GB | 32K | randwrite | 1558 | 51.1MB/s | 636.63 (usec) |
100GB | 32K | randread | 1548 | 50.7MB/s | 640.94 (usec) |
100GB | 64K | randwrite | 1233 | 80.9MB/s | 805.39 (usec) |
100GB | 64K | randread | 1422 | 93.3MB/s | 697.73 (usec) |
100GB | 128K | randwrite | 1226 | 161MB/s | 809.83 (usec) |
100GB | 128K | randread | 1170 | 153MB/s | 848.97 (usec) |
100GB | 256K | randwrite | 1019 | 267MB/s | 975.74 (usec) |
100GB | 256K | randread | 1158 | 304MB/s | 858.06 (usec) |
100GB | 512K | randwrite | 768 | 403MB/s | 1295.93 (usec) |
100GB | 512K | randread | 981 | 515MB/s | 1013.64 (usec) |
100GB | 1M | randwrite | 507 | 532MB/s | 1966.59 (usec) |
100GB | 1M | randread | 768 | 806MB/s | 1296.3 (usec) |
- Make our CPU model different, the performance is lower than your result? My cpu is
2x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (10 cores @ 2.80GHz)
Your CPU is
2x Xeon Gold 6242 (16 cores @ 2.8 GHz)
- Should each NVME be made into 2 OSDs?
Yes, 2 OSDs per NVMe should help bandwidth with large block sizes. Vitastor OSD is single-threaded, so you're probably just hitting the single thread b/w limit due to memory copying. RDMA would also help because it eliminates 1 extra copy of 2. DPDK could also help but I haven't found usable TCP (or something else...) implementations for it so it's not possible to implement it yet.
Regarding latency, 3300 T1Q1 iops is still rather low, I'd expect at least 5000-5500 iops on average hardware. CPU model doesn't affect latency that much.
Given that the effect of turning CPU powersaving off is not noticeable then I think it just doesn't work, something probably overrides that setting and it just doesn't get applied. For example RHEL's tuned
may override it.
You can also tune interrupts & NUMA, enable/disable irqbalance and so on. In one of my tests simply enabling irqbalance raised the bandwidth 3x times :-)
You can also check your network with sockperf.
sockperf sr --tcp
on one node, sockperf pp -i SERVER_IP --tcp -m 4096
on the other one.
Note that it reports half of the roundtrip (one-way latency).
In fact, the IOPS increased by 3 times when the CPU performance mode is turned on compared to when the performance mode is not turned on. I have also turned it on before, and the above two results were performed with the CPU performance mode turned on.
What should I do about interrupts & NUMA & enable/disable irqbalance? I am not very familiar with this aspect.
sockperf result
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 192.168.1.216 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=9605; ReceivedMessages=9604
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=4265; ReceivedMessages=4265
sockperf: ====> avg-latency=64.476 (std-dev=1.374)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 64.476 usec
sockperf: Total 4265 observations; each percentile contains 42.65 observations
sockperf: ---> <MAX> observation = 76.196
sockperf: ---> percentile 99.999 = 76.196
sockperf: ---> percentile 99.990 = 76.196
sockperf: ---> percentile 99.900 = 74.508
sockperf: ---> percentile 99.000 = 68.758
sockperf: ---> percentile 90.000 = 65.041
sockperf: ---> percentile 75.000 = 64.607
sockperf: ---> percentile 50.000 = 64.467
sockperf: ---> percentile 25.000 = 64.329
sockperf: ---> <MIN> observation = 53.082
You have 65 us one-way latency, so your roundtrip time is 130 us. Theoretical performance of vitastor for replicated writes is 2 RTT + 1 Disk write = 260 us+disk. Disk is I think around 10-40 us. So around 270-300 us in total = 3300-3700 iops T1Q1... That's similar to what you get. :-) What network do you use? Average 10G network should be around twice faster. :-)
Regarding interrupts etc, here is a good article https://blog.cloudflare.com/how-to-achieve-low-latency/
irqbalance
service (installed via package manager) distributes network interrupts over all available cores which may be worse than manually pinning them to different cores of 1 CPU if the server has multiple CPUs. I.e. interrupts should be on the CPU to which the corresponding PCIe lanes are connected. Otherwise CPU will "proxy" memory requests through the other CPU and it will slow things down.
On the other hand if you don't have irqbalance
at all then interrupts may be not distributed over multiple CPU cores at all which will also lower the performance...
You can also try to disable offloads and interrupt coalescing (in case they're enabled), like ethtool -K enp3s0f0 gro off gso off tso off lro off sg off; ethtool -C enp3s0f0 rx-usecs 0
.
Great, I use tuned and installed irqbalance to reduce the network latency between vitastor hosts to 22 usec.
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 192.168.1.217 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=22594; ReceivedMessages=22593
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=12442; ReceivedMessages=12442
sockperf: ====> avg-latency=22.069 (std-dev=0.871)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 22.069 usec
sockperf: Total 12442 observations; each percentile contains 124.42 observations
sockperf: ---> <MAX> observation = 74.262
sockperf: ---> percentile 99.999 = 74.262
sockperf: ---> percentile 99.990 = 63.500
sockperf: ---> percentile 99.900 = 27.764
sockperf: ---> percentile 99.000 = 24.001
sockperf: ---> percentile 90.000 = 22.471
sockperf: ---> percentile 75.000 = 22.075
sockperf: ---> percentile 50.000 = 21.939
sockperf: ---> percentile 25.000 = 21.828
sockperf: ---> <MIN> observation = 21.307
Adjust the network card interruption
ethtool -K enp3s0f0 gro off gso off tso off lro off sg off; ethtool -C enp3s0f0 rx-usecs 0
4K randwrite IOPS increased to 3638, 270.60 (usec)
Latency from inside VM doesn't matter because vitastor client connects to OSDs from the host
Latency from inside VM doesn't matter because vitastor client connects to OSDs from the host
Thanks, I get it. Then I am going to split a SSD into 2 OSDs to see if the latency will be reduced.
Q1 latency won't be reduced with split, but throughput with large iodepth should improve, yeah :-) By the way, what's the Q1T1 latency now after applying tuned fixes?
When tuned-adm profile network-latency
, the latency is reduced, but the throughput is also reduced.
Virtual Disk Size | I/O Block Size | I/O Mode | IOPS | BW | Latency |
---|---|---|---|---|---|
100GB | 4K | randwrite | 3638 | 14.9MB/s | 270.6 (usec) |
100GB | 4K | randread | 4095 | 16.8MB/s | 240.27 (usec) |
100GB | 8K | randwrite | 3271 | 26.8MB/s | 301.54 (usec) |
100GB | 8K | randread | 3414 | 27.0MB/s | 288.84 (usec) |
100GB | 16K | randwrite | 2732 | 44.8MB/s | 361.83 (usec) |
100GB | 16K | randread | 2596 | 42.5MB/s | 380.87 (usec) |
100GB | 32K | randwrite | 2027 | 66.4MB/s | 488.94 (usec) |
100GB | 32K | randread | 2243 | 73.5MB/s | 441.61 (usec) |
100GB | 64K | randwrite | 1401 | 91.8MB/s | 709.41 (usec) |
100GB | 64K | randread | 1812 | 119MB/s | 547.71 (usec) |
100GB | 128K | randwrite | 1134 | 149MB/s | 877.56 (usec) |
100GB | 128K | randread | 1384 | 181MB/s | 718.4 (usec) |
100GB | 256K | randwrite | 864 | 227MB/s | 1152.79 (usec) |
100GB | 256K | randread | 1143 | 300MB/s | 870.33 (usec) |
100GB | 512K | randwrite | 581 | 305MB/s | 1716.65 (usec) |
100GB | 512K | randread | 905 | 475MB/s | 1100.02 (usec) |
100GB | 1M | randwrite | 355 | 373MB/s | 2806.67 (usec) |
100GB | 1M | randread | 617 | 647MB/s | 1615.37 (usec) |
3300 -> 3600 iops isn't a good reduce). Something in the network is probably still slow...
you can check out what osds say in their logs about average latency 'avg latency for op xxx' is average execution latency for the operation, excluding roundtrip time from and to the client, but including latency for suboperations triggered by the primary osd on secondaries. for the "secondary" ops like "write_stable" it's just disk latency because they don't generate subops 'avg latency for subop xxx' is average execution latency for the sub-operation including the roundtrip to secondary OSD
so in your case 'write_stable op latency' is around 60us (disk), and 'write_stable subop latency' is 160-200us, so rtt is still 100-140us. why is it so slow?) what network do you use?
My configuration is
switch: S6720-SI Series Multi GE Switches
CPU: 2x Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (10 cores @ 2.80GHz)
Network card: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Very strange, why my network latency is so high. I don't have the latency data of the network card and switch, and I am not sure how much the system is wasting.
Does OSD ping log mean that there is no I/O?
Oct 19 06:38:00 vita-3 vitastor-osd[37215]: [OSD 3] avg latency for subop 15 (ping): 115 us
Oct 19 06:38:06 vita-3 vitastor-osd[37215]: [OSD 3] avg latency for subop 15 (ping): 117 us
Oct 19 06:38:09 vita-3 vitastor-osd[37215]: [OSD 3] avg latency for subop 15 (ping): 110 us
Oct 19 06:38:15 vita-3 vitastor-osd[37215]: [OSD 3] avg latency for subop 15 (ping): 124 us
Oct 19 06:38:21 vita-3 vitastor-osd[37215]: [OSD 3] avg latency for subop 15 (ping): 113 us
Idle pings may be slower than 'real' even when CPU powersave is turned off. I get 60 us idle ping time between 2 osds on localhost even with powersave disabled, but it drops to ~35 us when testing T1Q1 randread. But in your case 120 us seems similar to the "normal" ping, i.e. similar to just the network RTT in your case... how high is your ping latency when testing T1Q1 randread?
When T1Q1 randread 4K, fio runs for 5 minutes, starting at 21:45:04. The ping delay did not increase, but when fio ended, the OSDping delay on vita-2 increased.
I have uploaded log to: https://github.com/lnsyyj/vitastor-log/tree/main/vitastor-osd-log/20211020
When randwrite 64k, it appeared in the osd log. Is this a journal collection?
Oct 20 03:12:55 vita-3 vitastor-osd[1679]: Ran out of journal space (used_start=0086f000, next_free=0084d000, dirty_start=0083c000)
When randwrite 64k, it appeared in the osd log. Is this a journal collection? As long as it doesn't block or noticeably slow down writes it's OK. It means that OSD writes journal entries faster than it flushes old entries. You can try to raise journal size, but it may not remove these messages because it's the speed that matters, not just the free space. Anyway some amount of these messages is OK :-)
I have uploaded log to: https://github.com/lnsyyj/vitastor-log/tree/main/vitastor-osd-log/20211020
Yes it seems ping time is still about 90-100us even when OSDs are active... so it's still the same, network problem should be found and fixed to improve latency :-)
I also tested vitastor with a single NVMe and yes, it seems it requires 2-3 OSDs per NVMe to fully utilize linear write speed, just because of memory copying.
I think about how to reduce network latency, but maybe my hardware has reached its limit.
I think about how to reduce network latency, but maybe my hardware has reached its limit.
But at the same time you say sockperf shows ~22us? Can you recheck sockperf between OSD nodes and between OSD and client?
I have 3 servers in a vitastor cluster, and a virtual machine runs on one of them. vita-1 10.10.0.215 vita-2 10.10.0.216 virtual machine running node vita-3 10.10.0.217 ubuntu18.04 virtual machine 192.168.122.213
The following are the results of sockperf running between physical nodes:
10.10.0.215 -> 10.10.0.215
root@vita-1:~# sockperf pp -i 10.10.0.215 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.215 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=66350; ReceivedMessages=66349
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.546 sec; SentMessages=36325; ReceivedMessages=36325
sockperf: ====> avg-latency=7.484 (std-dev=0.157)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 7.484 usec
sockperf: Total 36325 observations; each percentile contains 363.25 observations
sockperf: ---> <MAX> observation = 15.054
sockperf: ---> percentile 99.999 = 15.054
sockperf: ---> percentile 99.990 = 11.008
sockperf: ---> percentile 99.900 = 9.398
sockperf: ---> percentile 99.000 = 7.791
sockperf: ---> percentile 90.000 = 7.603
sockperf: ---> percentile 75.000 = 7.538
sockperf: ---> percentile 50.000 = 7.466
sockperf: ---> percentile 25.000 = 7.407
sockperf: ---> <MIN> observation = 6.336
10.10.0.215 -> 10.10.0.216
root@vita-1:~# sockperf pp -i 10.10.0.216 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.216 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=23199; ReceivedMessages=23198
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=12788; ReceivedMessages=12788
sockperf: ====> avg-latency=21.477 (std-dev=0.384)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 21.477 usec
sockperf: Total 12788 observations; each percentile contains 127.88 observations
sockperf: ---> <MAX> observation = 29.251
sockperf: ---> percentile 99.999 = 29.251
sockperf: ---> percentile 99.990 = 28.595
sockperf: ---> percentile 99.900 = 27.234
sockperf: ---> percentile 99.000 = 22.760
sockperf: ---> percentile 90.000 = 21.633
sockperf: ---> percentile 75.000 = 21.530
sockperf: ---> percentile 50.000 = 21.434
sockperf: ---> percentile 25.000 = 21.349
sockperf: ---> <MIN> observation = 20.915
10.10.0.215 -> 10.10.0.217
root@vita-1:~# sockperf pp -i 10.10.0.217 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.217 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=23040; ReceivedMessages=23039
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=12695; ReceivedMessages=12695
sockperf: ====> avg-latency=21.640 (std-dev=1.140)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 21.640 usec
sockperf: Total 12695 observations; each percentile contains 126.95 observations
sockperf: ---> <MAX> observation = 84.146
sockperf: ---> percentile 99.999 = 84.146
sockperf: ---> percentile 99.990 = 77.598
sockperf: ---> percentile 99.900 = 28.659
sockperf: ---> percentile 99.000 = 24.601
sockperf: ---> percentile 90.000 = 21.860
sockperf: ---> percentile 75.000 = 21.626
sockperf: ---> percentile 50.000 = 21.495
sockperf: ---> percentile 25.000 = 21.386
sockperf: ---> <MIN> observation = 20.845
10.10.0.216 -> 10.10.0.215
root@vita-2:~# sockperf pp -i 10.10.0.215 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.215 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=23230; ReceivedMessages=23229
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=12783; ReceivedMessages=12783
sockperf: ====> avg-latency=21.489 (std-dev=0.607)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 21.489 usec
sockperf: Total 12783 observations; each percentile contains 127.83 observations
sockperf: ---> <MAX> observation = 30.769
sockperf: ---> percentile 99.999 = 30.769
sockperf: ---> percentile 99.990 = 29.548
sockperf: ---> percentile 99.900 = 27.635
sockperf: ---> percentile 99.000 = 24.835
sockperf: ---> percentile 90.000 = 21.720
sockperf: ---> percentile 75.000 = 21.528
sockperf: ---> percentile 50.000 = 21.382
sockperf: ---> percentile 25.000 = 21.271
sockperf: ---> <MIN> observation = 20.840
10.10.0.216 -> 10.10.0.216
root@vita-2:~# sockperf pp -i 10.10.0.216 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.216 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=48581; ReceivedMessages=48580
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=26678; ReceivedMessages=26678
sockperf: ====> avg-latency=10.271 (std-dev=0.200)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 10.271 usec
sockperf: Total 26678 observations; each percentile contains 266.78 observations
sockperf: ---> <MAX> observation = 17.884
sockperf: ---> percentile 99.999 = 17.884
sockperf: ---> percentile 99.990 = 14.199
sockperf: ---> percentile 99.900 = 12.691
sockperf: ---> percentile 99.000 = 10.743
sockperf: ---> percentile 90.000 = 10.389
sockperf: ---> percentile 75.000 = 10.320
sockperf: ---> percentile 50.000 = 10.251
sockperf: ---> percentile 25.000 = 10.188
sockperf: ---> <MIN> observation = 9.932
10.10.0.216 -> 10.10.0.217
root@vita-2:~# sockperf pp -i 10.10.0.217 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.217 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=21431; ReceivedMessages=21430
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=11795; ReceivedMessages=11795
sockperf: ====> avg-latency=23.300 (std-dev=1.512)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 23.300 usec
sockperf: Total 11795 observations; each percentile contains 117.95 observations
sockperf: ---> <MAX> observation = 126.662
sockperf: ---> percentile 99.999 = 126.662
sockperf: ---> percentile 99.990 = 97.480
sockperf: ---> percentile 99.900 = 29.805
sockperf: ---> percentile 99.000 = 26.100
sockperf: ---> percentile 90.000 = 23.848
sockperf: ---> percentile 75.000 = 23.254
sockperf: ---> percentile 50.000 = 23.089
sockperf: ---> percentile 25.000 = 22.974
sockperf: ---> <MIN> observation = 22.592
10.10.0.217 -> 10.10.0.215
root@vita-3:~# sockperf pp -i 10.10.0.215 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.215 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=23042; ReceivedMessages=23041
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=12689; ReceivedMessages=12689
sockperf: ====> avg-latency=21.618 (std-dev=0.493)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 21.618 usec
sockperf: Total 12689 observations; each percentile contains 126.89 observations
sockperf: ---> <MAX> observation = 29.375
sockperf: ---> percentile 99.999 = 29.375
sockperf: ---> percentile 99.990 = 29.023
sockperf: ---> percentile 99.900 = 27.652
sockperf: ---> percentile 99.000 = 23.687
sockperf: ---> percentile 90.000 = 21.809
sockperf: ---> percentile 75.000 = 21.645
sockperf: ---> percentile 50.000 = 21.529
sockperf: ---> percentile 25.000 = 21.430
sockperf: ---> <MIN> observation = 20.988
10.10.0.217 -> 10.10.0.216
root@vita-3:~# sockperf pp -i 10.10.0.216 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.216 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=21532; ReceivedMessages=21531
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=11849; ReceivedMessages=11849
sockperf: ====> avg-latency=23.176 (std-dev=0.623)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 23.176 usec
sockperf: Total 11849 observations; each percentile contains 118.49 observations
sockperf: ---> <MAX> observation = 34.893
sockperf: ---> percentile 99.999 = 34.893
sockperf: ---> percentile 99.990 = 31.490
sockperf: ---> percentile 99.900 = 29.786
sockperf: ---> percentile 99.000 = 25.420
sockperf: ---> percentile 90.000 = 23.761
sockperf: ---> percentile 75.000 = 23.166
sockperf: ---> percentile 50.000 = 23.002
sockperf: ---> percentile 25.000 = 22.890
sockperf: ---> <MIN> observation = 22.465
10.10.0.217 -> 10.10.0.217
root@vita-3:~# sockperf pp -i 10.10.0.217 --tcp -m 4096
sockperf: == version #3.6-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.217 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=48284; ReceivedMessages=48283
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=26512; ReceivedMessages=26512
sockperf: ====> avg-latency=10.335 (std-dev=2.412)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 10.335 usec
sockperf: Total 26512 observations; each percentile contains 265.12 observations
sockperf: ---> <MAX> observation = 361.655
sockperf: ---> percentile 99.999 = 361.655
sockperf: ---> percentile 99.990 = 25.666
sockperf: ---> percentile 99.900 = 17.767
sockperf: ---> percentile 99.000 = 13.521
sockperf: ---> percentile 90.000 = 10.381
sockperf: ---> percentile 75.000 = 10.298
sockperf: ---> percentile 50.000 = 10.225
sockperf: ---> percentile 25.000 = 10.162
sockperf: ---> <MIN> observation = 9.910
The following is the result of sockperf to the physical node inside the virtual machine node:
192.168.122.213 -> 10.10.0.215
root@ubuntu:~/sockperf# sockperf pp -i 10.10.0.215 --tcp -m 4096
sockperf: == version #3.6-0.git737d1e3e5576 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.215 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=8929; ReceivedMessages=8928
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=4865; ReceivedMessages=4865
sockperf: ====> avg-latency=56.519 (std-dev=11.084)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 56.519 usec
sockperf: Total 4865 observations; each percentile contains 48.65 observations
sockperf: ---> <MAX> observation = 209.568
sockperf: ---> percentile 99.999 = 209.568
sockperf: ---> percentile 99.990 = 209.568
sockperf: ---> percentile 99.900 = 128.706
sockperf: ---> percentile 99.000 = 80.863
sockperf: ---> percentile 90.000 = 71.991
sockperf: ---> percentile 75.000 = 63.230
sockperf: ---> percentile 50.000 = 55.109
sockperf: ---> percentile 25.000 = 47.207
sockperf: ---> <MIN> observation = 42.518
192.168.122.213 -> 10.10.0.216
root@ubuntu:~/sockperf# sockperf pp -i 10.10.0.216 --tcp -m 4096
sockperf: == version #3.6-0.git737d1e3e5576 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.216 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=20648; ReceivedMessages=20647
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=11407; ReceivedMessages=11407
sockperf: ====> avg-latency=24.080 (std-dev=0.909)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.080 usec
sockperf: Total 11407 observations; each percentile contains 114.07 observations
sockperf: ---> <MAX> observation = 36.168
sockperf: ---> percentile 99.999 = 36.168
sockperf: ---> percentile 99.990 = 35.577
sockperf: ---> percentile 99.900 = 30.959
sockperf: ---> percentile 99.000 = 28.231
sockperf: ---> percentile 90.000 = 24.860
sockperf: ---> percentile 75.000 = 24.321
sockperf: ---> percentile 50.000 = 23.944
sockperf: ---> percentile 25.000 = 23.768
sockperf: ---> <MIN> observation = 21.944
192.168.122.213 -> 10.10.0.217
root@ubuntu:~/sockperf# sockperf pp -i 10.10.0.217 --tcp -m 4096
sockperf: == version #3.6-0.git737d1e3e5576 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.10.0.217 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=10316; ReceivedMessages=10315
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=5950; ReceivedMessages=5950
sockperf: ====> avg-latency=46.208 (std-dev=1.320)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 46.208 usec
sockperf: Total 5950 observations; each percentile contains 59.50 observations
sockperf: ---> <MAX> observation = 62.353
sockperf: ---> percentile 99.999 = 62.353
sockperf: ---> percentile 99.990 = 61.527
sockperf: ---> percentile 99.900 = 57.462
sockperf: ---> percentile 99.000 = 52.314
sockperf: ---> percentile 90.000 = 46.679
sockperf: ---> percentile 75.000 = 46.213
sockperf: ---> percentile 50.000 = 45.972
sockperf: ---> percentile 25.000 = 45.672
sockperf: ---> <MIN> observation = 44.708