libvma icon indicating copy to clipboard operation
libvma copied to clipboard

TCP bandwidth with iperf/qperf/sockperf

Open zorrohahaha opened this issue 6 years ago • 9 comments

I am trying to run this libvma on the aarch64 platform and it works. And when using iperf/qperf/sockperf to do some evaluation, I find the latency is better than that through the kernel socket stack. The UDP bandwidth via qperf is also better. But the bandwidth of TCP mode is much worse than that through the kernel mode. Only several hundred Mb to 1Gb (25Gb NIC interface). After setting the parameters "VMA_TCP_NODELAY=1 VMA_TX_BUFS=4000000", the bandwidth is getting a little better, upto about 10Gbps. But it is still far away from the line rate.

zorrohahaha avatar Dec 06 '18 03:12 zorrohahaha

By using the recommended sockperf, the TCP bandwidth is still quite low without VMA_SPEC being set before the preload dynamic library. I just user Linux perf tool to check the thread status. It seems that the TX buffer is exhausted and it will waiting for the ACK from the another side. Then dozens of Kbs window could be released and some data could be sent again. The "rx wait" takes a lot of CPU time. (Correct me if my understanding is wrong)

When UDP is used, the bandwidth is much better since there is no ACK required.

zorrohahaha avatar Dec 06 '18 12:12 zorrohahaha

I searched from Internet and found a slides named "Data Transfer Node Performance GÉANT & AENEAS". In this slides, TCP bandwidth with iperf2 is also mentioned. It seems that the bandwidth tested through libvma is lower that through the kernel stack.

zorrohahaha avatar Dec 07 '18 06:12 zorrohahaha

Hi @zorrohahaha,

VMA is optimize for TCP packets below MTU/MSS size. We are aware for the disadvantage with high TCP bandwidth above MTU/MSS size. Resolving this gap is in our roadmap. Please consider using VMA_RX_POLL_ON_TX_TCP=1 parameter which improve the ACK processing time.

Thanks, Liran

liranoz12 avatar Jan 09 '19 14:01 liranoz12

Hi @zorrohahaha please try to use https://github.com/Mellanox/libvma/releases/tag/8.9.2 and compile VMA with --enable-tso

igor-ivanov avatar Sep 05 '19 12:09 igor-ivanov

Hi,we also find this issue, the bandwidth of TCP bypass with vma is much worse than that through the kernel.

littleyanglovegithub avatar Jul 31 '20 01:07 littleyanglovegithub

@littleyanglovegithub Did you try VMA 8.9.2 and up with --enable-tso configuration option? Did you use server under OS and client under VMA? What is a message size?

igor-ivanov avatar Aug 03 '20 18:08 igor-ivanov

I strongly suggest you write the limitation "VMA is optimize for TCP packets below MTU/MSS size. We are aware for the disadvantage with high TCP bandwidth above MTU/MSS size." in the offical website

daversun avatar Nov 05 '20 02:11 daversun

Please consider using VMA_RX_POLL_ON_TX_TCP=1 parameter which improve the ACK processing time.

hi , Is there any update for VMA throughput performance with big packet size over MTU?

zhentaoyang1982 avatar Jun 29 '22 07:06 zhentaoyang1982

VMA usage allows to improve throughput values comparing with Linux for payload size less 16KB. It is not always seen on simple benchmark scenarios as iperf/sockperf. NGNIX configured with different workers under configured VMA (--enable-tso and related VMA environment options) demonstrates better throughput results vs Linux. As an example there is open link as https://www.microsoft.com/en-us/research/uploads/prod/2019/08/p90-li.pdf page 100 with analysis that can be studied.

igor-ivanov avatar Jul 01 '22 08:07 igor-ivanov