libvma performance issues with haproxy
I have observed the bandwidth and latency improvements in benchmark applications like sockperf and iperf with use of libvma for small TCP message sizes. Hence, I am trying to evaluate the performance improvement of haproxy with libvma for analysing use of libvma in layer 7 load balancing. I have a setup of two machines running nginx servers and haproxy is configured in http mode with round robin load balancing. I am using wrk as a load generator from another machine on the network to benchmark the haproxy setup. Without libvma I have results for the given test from wrk as
wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
10 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 339.48us 284.01us 12.93ms 97.42%
Req/Sec 3.09k 217.10 4.18k 71.79%
Latency Distribution
50% 303.00us
75% 357.00us
90% 436.00us
99% 789.00us
925118 requests in 30.10s, 740.22MB read
Requests/sec: 30735.44
Transfer/sec: 24.59MB
Running with libvma
LD_PRELOAD=libvma.so haproxy -- /etc/haproxy/haproxy.cfg
wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
10 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.81ms 116.18ms 1.03s 96.55%
Req/Sec 2.71k 742.46 7.98k 85.83%
Latency Distribution
50% 215.00us
75% 593.00us
90% 1.02ms
99% 767.08ms
771434 requests in 30.10s, 617.25MB read
Requests/sec: 25629.47
Transfer/sec: 20.51MB
Also running with VMA_SPEC=latency
VMA_SPEC=latency LD_PRELOAD=libvma.so haproxy -- /etc/haproxy/haproxy.cfg
wrk --latency http://10.48.114.100:8025 -t 10 -c 12 -d 30
Running 30s test @ http://10.48.114.100:8025
10 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.26ms 3.64ms 75.16ms 84.54%
Req/Sec 1.07k 380.69 3.46k 77.74%
Latency Distribution
50% 329.00us
75% 3.30ms
90% 7.67ms
99% 12.13ms
291558 requests in 30.03s, 233.29MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 9708.40
Transfer/sec: 7.77MB
Both average latency and total bandwidth stats are lesser with libvma. I have tried following the tuning guide to bind the process to same NUMA node as the NIC and to cores but the results are still worse. This behaviour is strange as vma_stats shows all the packets as offloaded.
Are there any tips to tuning libvma paramters to increase performance for this particular workload?
haproxy config file for reference.
global
user root
group root
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
nosplice
# TUNING
#tune.h2.initial-window-size 1048576
defaults
timeout connect 50000
timeout client 500000
timeout server 500000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
http-reuse safe
# My Configuration
frontend fe
mode http
bind *:8025
default_backend be
backend be
mode http
balance roundrobin
#option http-keep-alive
server s0 10.48.34.122:80
server s2 10.48.34.125:80
Config: VMA_VERSION: 8.9.5-0 OFED Version: MLNX_OFED_LINUX-4.7-3.2.9.0 System: 4.9.0-9-amd64 Architecture: x86_64 NIC: ConnectX-5 EN network interface card
Hello @LeaflessMelospiza,
Any networking benchmarks can not always allow to reconstruct real world application behavior.
haproxy has own specific and it has not been studied well to have recommended optimal VMA configuration.
You can try to compile VMA using --enable-tso configuration option.