TLS performance regression with 4096 RSA key migrating from HAProxy 2.2 to 2.8
Detailed Description of the Problem
A significant TLS performance degradation is observed when migrating from OpenShift 4.12 to 4.17.
- HAProxy versions involved:
- OpenShift 4.12: HAProxy
2.2.24 - OpenShift 4.17: HAProxy
2.8.10
- OpenShift 4.12: HAProxy
- TLS key:
rsa:4096. - Performance testing tool:
vegeta. - HAProxy access logs show:
SSL handshake failure. - TLS termination occurs at the edge (frontend). No re-encryption between HAProxy and backend servers.
Vegeta runs:
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 303611, 1686.65, 970.91
Duration [total, attack, wait] 3m3s, 3m0s, 3.383s
Latencies [min, mean, 50, 90, 95, 99, max] 1.472ms, 2.898s, 4.13s, 5.226s, 5.403s, 5.843s, 21.245s
Bytes In [total, mean] 9615078, 31.67
Bytes Out [total, mean] 0, 0.00
Success [ratio] 58.65%
Status Codes [code:count] 0:125554 200:178057
Error Set:
Get "https://hello-world.com/": EOF
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 209729, 1164.72, 908.89
Duration [total, attack, wait] 3m3s, 3m0s, 2.976s
Latencies [min, mean, 50, 90, 95, 99, max] 1.157ms, 4.27s, 5.086s, 6.367s, 6.787s, 7.464s, 23.773s
Bytes In [total, mean] 8983872, 42.84
Bytes Out [total, mean] 0, 0.00
Success [ratio] 79.33%
Status Codes [code:count] 0:43361 200:166368
Error Set:
Get "https://hello-world.com/": EOF
Get "https://hello-world.com/": dial tcp 0.0.0.0:0->10.0.50.224:443: connect: connection refused
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 328537, 1825.15, 941.73
Duration [total, attack, wait] 3m3s, 3m0s, 3.429s
Latencies [min, mean, 50, 90, 95, 99, max] 1.23ms, 2.584s, 2.658s, 5.051s, 5.259s, 5.647s, 29.6s
Bytes In [total, mean] 9328284, 28.39
Bytes Out [total, mean] 0, 0.00
Success [ratio] 52.58%
Status Codes [code:count] 0:155791 200:172746
Error Set:
Get "https://hello-world.com/": EOF
Get "https://hello-world.com/": dial tcp 0.0.0.0:0->10.0.50.224:443: connect: connection refused
HAProxy 2.8.10 access logs:
haproxy[31]: 10.0.11.73:59679 [17/Sep/2025:07:46:55.416] fe_sni~ be_edge_http:test:myedge/pod:web-server-6fd7fc6444-w9ln5:service-unsecure:http:10.128.2.17:8080 0/0/37/25/62 200 253 - - --NI 25842/24728/51/51/0 0/0 "GET / HTTP/1.1"
haproxy[31]: 10.0.11.73:36407 [17/Sep/2025:07:46:55.416] fe_sni~ be_edge_http:test:myedge/pod:web-server-6fd7fc6444-w9ln5:service-unsecure:http:10.128.2.17:8080 0/0/37/25/62 200 253 - - --NI 25841/24727/50/50/0 0/0 "GET / HTTP/1.1"
haproxy[31]: 10.0.11.73:60273 [17/Sep/2025:07:46:55.416] fe_sni~ be_edge_http:test:myedge/pod:web-server-6fd7fc6444-w9ln5:service-unsecure:http:10.128.2.17:8080 0/0/37/25/62 200 253 - - --NI 25841/24727/49/49/0 0/0 "GET / HTTP/1.1"
haproxy[31]: 10.0.11.73:43247 [17/Sep/2025:07:46:25.507] fe_sni/1: SSL handshake failure
haproxy[31]: 10.0.11.73:57489 [17/Sep/2025:07:46:25.499] fe_sni/1: SSL handshake failure
haproxy[31]: 10.0.11.73:59035 [17/Sep/2025:07:46:54.457] public_ssl be_sni/fe_sni 1/32/1047 0 sD 25853/1120/1063/1063/0 0/0
haproxy[31]: 10.0.11.73:40352 [17/Sep/2025:07:46:54.490] public_ssl be_sni/fe_sni 1/0/1014 0 sD 25852/1119/1062/1062/0 0/0
haproxy[31]: 10.0.11.73:38208 [17/Sep/2025:07:46:54.490] public_ssl be_sni/fe_sni 1/0/1014 0 sD 25851/1118/1061/1061/0 0/0
HAProxy process shows excessive CPU usage (~350% on 4 cores node):
show tasks shows almost exclusive CPU assignment to ssl_sock_io_cb function:
Running tasks: 24406 (4 threads)
function places % lat_tot lat_avg
ssl_sock_io_cb 24285 99.5 - -
sc_conn_io_cb 70 0.2 - -
h1_io_cb 39 0.1 - -
process_stream 8 0.0 - -
h1_timeout_task 4 0.0 - -
Running tasks: 24369 (4 threads)
function places % lat_tot lat_avg
ssl_sock_io_cb 23941 98.2 - -
sc_conn_io_cb 224 0.9 - -
process_stream 88 0.3 - -
h1_io_cb 83 0.3 - -
h1_timeout_task 16 0.0 - -
session_expire_embryonic 8 0.0 - -
accept_queue_process 4 0.0 - -
other 3 0.0 - -
xprt_handshake_io_cb 2 0.0 - -
Running tasks: 24185 (4 threads)
function places % lat_tot lat_avg
ssl_sock_io_cb 23884 98.7 - -
sc_conn_io_cb 165 0.6 - -
h1_io_cb 69 0.2 - -
process_stream 62 0.2 - -
h1_timeout_task 4 0.0 - -
xprt_handshake_io_cb 1 0.0 - -
We think that the following issues were showing similar problems:
- https://github.com/haproxy/haproxy/issues/1914
- https://github.com/haproxy/haproxy/issues/1988
As with https://github.com/haproxy/haproxy/issues/1914 we managed to narrow the problem down to 4c48edba4f45bb78f41af7d79d3c176710fe6a90 which first appeared in 2.4.-dev12 release. Reverting the commit results in:
- 100% success rate (no SSL handshake failures)
- Improved request rate (~11K vs ~2K)
- Reduced CPU usage
Vegeta runs with 4c48edba4f45bb78f41af7d79d3c176710fe6a90 reverted:
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 2054995, 11414.31, 11405.44
Duration [total, attack, wait] 3m0s, 3m0s, 140.122ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.898ms, 304.647ms, 284.05ms, 380.766ms, 411.503ms, 473.042ms, 12.395s
Bytes In [total, mean] 110969730, 54.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2054995
Error Set:
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 2047009, 11372.09, 11368.99
Duration [total, attack, wait] 3m0s, 3m0s, 49.098ms
Latencies [min, mean, 50, 90, 95, 99, max] 674.292µs, 287.529ms, 262.527ms, 366.743ms, 398.454ms, 475.019ms, 14.126s
Bytes In [total, mean] 110538486, 54.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2047009
Error Set:
CPU usage:
show tasks (number of task is noticeably less though):
Running tasks: 5005 (4 threads)
function places % lat_tot lat_avg
ssl_sock_io_cb 4819 96.2 - -
h1_io_cb 186 3.7 - -
Running tasks: 0 (4 threads)
function places % lat_tot lat_avg
Running tasks: 1 (4 threads)
function places % lat_tot lat_avg
h1_io_cb 1 100.0 - -
Running tasks: 1 (4 threads)
function places % lat_tot lat_avg
h1_io_cb 1 100.0 - -
Running tasks: 2 (4 threads)
function places % lat_tot lat_avg
ssl_sock_io_cb 2 100.0 - -
Running tasks: 3 (4 threads)
function places % lat_tot lat_avg
sc_conn_io_cb 2 66.6 - -
process_stream 1 33.3 - -
Running tasks: 3 (4 threads)
function places % lat_tot lat_avg
sc_conn_io_cb 2 66.6 - -
process_stream 1 33.3 - -
Running tasks: 2 (4 threads)
function places % lat_tot lat_avg
h1_io_cb 1 50.0 - -
ssl_sock_io_cb 1 50.0 - -
Expected Behavior
No SSL handshake failures and similar request and success rates when a 4K RSA key is used migrating from HAProxy 2.2.z to >2.4.
Steps to Reproduce the Behavior
- Use HAProxy
2.8.10or any version starting from2.4.0. - Use 4K RSA key for the server certificate.
- Terminate TLS on the frontend, don't use TLS between HAProxy and server.
- Number of threads: 4.
- Use
vegetato test the performance (5000workers, maximum hit rate). - Check the success rate from
vegetaand access logs in HAProxy.
Do you have any idea what may have caused this?
Reverting 4c48edba4f45bb78f41af7d79d3c176710fe6a90 on HAProxy 2.8.10 helps to get TLS performance similar to 2.2.
Do you have an idea how to solve the issue?
No response
What is your configuration?
global
no strict-limits
maxconn 50000
nbthread 4
daemon
log /var/lib/rsyslog/rsyslog.sock len 1024 local1 info
log-send-hostname
ca-base /etc/ssl
crt-base /etc/ssl
stats socket /var/lib/haproxy/run/haproxy.sock mode 600 level admin expose-fd listeners
stats timeout 2m
tune.maxrewrite 8192
tune.bufsize 32768
ssl-default-bind-options ssl-min-ver TLSv1.2
tune.ssl.default-dh-param 2048
ssl-default-bind-ciphers REDACTED
ssl-default-bind-ciphersuites REDACTED
defaults
maxconn 50000
option httplog
log global
errorfile 503 /var/lib/haproxy/conf/error-page-503.http
errorfile 404 /var/lib/haproxy/conf/error-page-404.http
timeout connect 5s
timeout client 30s
timeout client-fin 1s
timeout server 30s
timeout server-fin 1s
timeout http-request 10s
timeout http-keep-alive 300s
timeout tunnel 1h
frontend public_ssl
option tcplog
bind :443 accept-proxy
tcp-request inspect-delay 5s
tcp-request content accept if { req_ssl_hello_type 1 }
acl sni req.ssl_sni -m found
acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found
use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough
use_backend be_sni if sni
default_backend be_no_sni
backend be_sni
server fe_sni unix@/var/lib/haproxy/run/haproxy-sni.sock weight 1 send-proxy
frontend fe_sni
bind unix@/var/lib/haproxy/run/haproxy-sni.sock ssl crt /var/lib/haproxy/router/certs/default.pem crt-list /var/lib/haproxy/conf/cert_config.map accept-proxy no-alpn
mode http
option idle-close-on-response
http-request deny if { hdr_len(content-length) 0 }
http-request del-header Proxy
http-request set-header Host %[req.hdr(Host),lower]
use_backend %[base,map_reg(/var/lib/haproxy/conf/os_edge_reencrypt_be.map)]
default_backend openshift_default
backend be_edge_http:test:myedge
mode http
option redispatch
option forwardfor
balance random
timeout check 5000ms
http-request add-header X-Forwarded-Host %[req.hdr(host)]
http-request add-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
http-request add-header X-Forwarded-Proto https if { ssl_fc }
http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
cookie 851b234ae52ef7709dfb5e5fadb97e8c insert indirect nocache httponly secure attr SameSite=None
server pod:web-server-6fd7fc6444-w9ln5:service-unsecure:http:10.128.2.17:8080 10.128.2.17:8080 cookie d8f71950c0e59754be3f646cea5d4fce weight 1
Output of haproxy -vv
$ haproxy -vv
HAProxy version 2.8.10-f28885f 2024/06/14 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.10.html
Running on: Linux 5.14.0-427.86.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 27 07:03:43 EDT 2025 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_ZLIB=1 USE_PCRE=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=4).
Built with OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
Running on OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.44 2020-02-12
Running on PCRE version : 8.44 2020-02-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.1 20231218 (Red Hat 11.4.1-3)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
Additional Information
vegeta tests with the CPU monitoring were done against 2.3.21 and 2.4.0 both compiled with OpenSSL1.1. 2.4.0 shows CPU trends similar to 2.8.10 and gets better TLS performance when 4c48edba4f45bb78f41af7d79d3c176710fe6a90 is reverted. 2.3.21 does not show TLS performance problems with 4K RSA key certificate. That's how we narrowed down to 2.4.-dev12 being problematic.
any chance to test 3.2.latest?
any chance to test 3.2.latest?
Testing 3.2.latest might not be straightforward. HAProxy 3 is something we haven’t tried yet, so our build configuration may require significant changes. Additionally, even if the issue is fixed in HAProxy 3.2, it could take a while for OpenShift to adopt this version. Since the regression is occurring now, we’d like to focus on a solution that can address it sooner.
Thank you for your complete report, we're going to investigate this.
any chance to test 3.2.latest?
We updated our build configuration and tested with HAProxy 3.2.4. While we may need additional testing, at first glance the CPU usage and success rate appear similar to what we observed with 2.8.10.
Vegeta runs:
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 349745, 1942.22, 997.56
Duration [total, attack, wait] 3m4s, 3m0s, 3.731s
Latencies [min, mean, 50, 90, 95, 99, max] 561.579µs, 2.475s, 3.097s, 4.851s, 5.17s, 6.472s, 29.146s
Bytes In [total, mean] 9901278, 28.31
Bytes Out [total, mean] 0, 0.00
Success [ratio] 52.43%
Status Codes [code:count] 0:166388 200:183357
Error Set:
Get "https://hello-world.com/": EOF
Get "https://hello-world.com/": dial tcp 0.0.0.0:0->10.0.52.233:443: connect: connection reset by peer
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 383030, 2128.03, 1064.25
Duration [total, attack, wait] 3m4s, 3m0s, 3.519s
Latencies [min, mean, 50, 90, 95, 99, max] 572.632µs, 2.18s, 2.427s, 4.254s, 4.515s, 5.08s, 28.945s
Bytes In [total, mean] 10546362, 27.53
Bytes Out [total, mean] 0, 0.00
Success [ratio] 50.99%
Status Codes [code:count] 0:187727 200:195303
Error Set:
Get "https://hello-world.com/": EOF
Access logs contained a new error though (EPIPE returned by OS):
haproxy[25]: 10.0.8.143:41463 [18/Sep/2025:20:29:55.228] fe_sni~ be_edge_http:test:myedge/pod:web-server-6499c75b57-wcw2h:service-unsecure:http:10.129.2.18:8080 0/0/38/18/56 200 253 - - --NI 24872/23625/19/19/0 0/0 "GET / HTTP/1.1"
haproxy[25]: 10.0.8.143:35871 [18/Sep/2025:20:29:55.228] fe_sni~ be_edge_http:test:myedge/pod:web-server-6499c75b57-wcw2h:service-unsecure:http:10.129.2.18:8080 0/0/38/18/56 200 253 - - --NI 24872/23625/18/18/0 0/0 "GET / HTTP/1.1"
haproxy[25]: 10.0.8.143:46318 [18/Sep/2025:20:29:54.289] public_ssl be_sni/fe_sni 1/0/1012 0 sD 24880/1249/1235/1235/0 0/0
haproxy[25]: 10.0.8.143:57061 [18/Sep/2025:20:29:25.301] fe_sni/1: EPIPE returned by OS
haproxy[25]: 10.0.8.143:46151 [18/Sep/2025:20:29:25.318] fe_sni/1: EPIPE returned by OS
. . .
haproxy[25]: 10.0.8.143:43075 [18/Sep/2025:20:29:25.301] fe_sni/1: Timeout during SSL handshake
haproxy[25]: 10.0.8.143:49631 [18/Sep/2025:20:29:25.301] fe_sni/1: Timeout during SSL handshake
haproxy[25]: 10.0.8.143:43911 [18/Sep/2025:20:29:25.301] fe_sni/1: Timeout during SSL handshake
This may be something to be re-investigated on our side, since we used the same machines and configuration as in the 2.8.10 tests.
Thanks for your test. The patch you've found only affects scheduling priorities, not CPU usage. So there might be two issues at once possibly (maybe one uncovers the other one). I'm suspecting that there might be some late asymmetric crypto processing past the end of the handshake and that this is causing the huge latencies, explaining why reverting the patch and forcing to serialize the tasks could improve the situation. However it should not change anything CPU-wise. Maybe for an unknown reason we're seeing extraneous wakeups where it's not possible to work, or worse, where crypto operations are done over and over, I don't know.
During your tests on 3.2, it could be helpful if you could issue this on the CLI:
set profiling tasks on# to enable task profiling- wait about 10 seconds
show profiling tasks
I'm interested in seeing the number of calls to the IO callbacks. I suspect some of them might be insanely looping. And at least we'll see the average CPU consumption per call, which should give us some hints as well.
Also, two series of show info;show activity 10 seconds apart during the test would be very helpful to spot problems. For example we'll be able to see if we're saturated doing new key signs (e.g. maybe the errors are upsetting vegeta which tries a new handshake and only counts successful ones, resulting in a lot of calculations on one side and a lower number on the other one).
Are the tests only in TLS 1.3 ? I'm wondering if we don't have differences with TLS1.2 on post-handshakes.
Also you mentioned OpenSSL 1.1 but I'm seeing OpenSSL 3.0.7 1 Nov 2022 in your haproxy -vv which might be a good reason for your problems.
Also you mentioned OpenSSL 1.1 but I'm seeing OpenSSL 3.0.7 1 Nov 2022 in your haproxy -vv which might be a good reason for your problems.
HAProxy 2.8.10 was running on OpenSSL3. But 2.4.0 (or 2.4-dev12) tests we did were running on OpenSSL1:
Additional Information vegeta tests with the CPU monitoring were done against 2.3.21 and 2.4.0 both compiled with OpenSSL1.1. 2.4.0 shows CPU trends similar to 2.8.10 and gets better TLS performance when 4c48edba4f45bb78f41af7d79d3c176710fe6a90 is reverted. 2.3.21 does not show TLS performance problems with 4K RSA key certificate. That's how we narrowed down to 2.4.-dev12 being problematic.
We considered the OpenSSL difference at the very beginning, and we don’t have evidence that OpenSSL 3 is the root cause. However, if there is a good reason to believe it may be related to OpenSSL, we can try testing 2.8.10 with OpenSSL 1.
We made a long article about OpenSSL 3.0 problems https://www.haproxy.com/blog/state-of-ssl-stacks. If your previous 2.2 deployment used OpenSSL 1.1.1 and you migrated to HAProxy 2.8 with OpenSSL 3.0, there's absolutely no chance that you will have the same level of performance.
However if you made your 2.4 tests with OpenSSL 1.1.1, there's probably something else. But you should stick with 1.1.1 for your tests to avoid problems related to that.
During your tests on 3.2, it could be helpful if you could issue this on the CLI:
set profiling tasks on# to enable task profiling wait about 10 secondsshow profiling tasksI'm interested in seeing the number of calls to the IO callbacks. I suspect some of them might be insanely looping. And at least we'll see the average CPU consumption per call, which should give us some hints as well.
We did several runs of echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head on HAProxy 3.2.4 with OpenSSL1:
Requests [total, rate, throughput] 292721, 1626.18, 946.99
Success [ratio] 59.40%
Requests [total, rate, throughput] 299801, 1663.30, 1020.26
Success [ratio] 62.54%
Requests [total, rate, throughput] 248612, 1381.25, 1191.59
Success [ratio] 87.64%
Requests [total, rate, throughput] 249379, 1385.37, 1176.66
Success [ratio] 86.30%
$ haproxy -vv
HAProxy version 3.2.4-98813a1 2025/08/13 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2030.
Known bugs: http://www.haproxy.org/bugs/bugs-3.2.4.html
Running on: Linux 5.14.0-427.74.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 13 12:07:50 EDT 2025 x86_64
Build options :
TARGET = linux-glibc
CC = cc
CFLAGS = -O2 -g -fwrapv
OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_ZLIB=1 USE_PCRE=1
DEBUG =
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB +ACME
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=32, MAX_THREADS=1024, default=4).
Built with SSL library version : OpenSSL 1.1.1k FIPS 25 Mar 2021
Running on SSL library version : OpenSSL 1.1.1k FIPS 25 Mar 2021
SSL library supports TLS extensions : yes
SSL library supports SNI : yes
SSL library FIPS mode : no
SSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-10)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
spop : mode=SPOP side=BE mux=SPOP flags=HOL_RISK|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Results of show profiling tasks:
Tasks activity over 1306.546 sec till 0.000 sec ago:
function calls cpu_tot cpu_avg lat_tot lat_avg
sc_conn_io_cb+0 10421808 16.62s 1.594us 16.31d 135.2ms <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup
sc_conn_io_cb+0 6207583 1.555m 15.03us 10.13d 141.0ms <- sock_conn_iocb@src/sock.c:1098 tasklet_wakeup
process_stream+0 4929446 1.063m 12.94us 9.177h 6.702ms <- sc_notify@src/stconn.c:1206 task_wakeup
process_stream+0 3435716 2.332m 40.72us 3.616h 3.789ms <- stream_new@src/stream.c:583 task_wakeup
ssl_sock_io_cb 2752977 7.076s 2.570us 4.303d 135.1ms <- sock_conn_iocb@src/sock.c:1098 tasklet_wakeup
h1_io_cb 2751430 57.92s 21.05us 3.931d 123.4ms <- ssl_sock_io_cb@src/ssl_sock.c:5812 tasklet_wakeup
h1_io_cb 2733164 11.69s 4.278us 1.875m 41.17us <- sock_conn_iocb@src/sock.c:1098 tasklet_wakeup
xprt_handshake_io_cb 2109333 12.30s 5.833us 12.02m 341.9us <- xprt_handshake_start@src/xprt_handshake.c:140 tasklet_wakeup
sc_conn_io_cb+0 2018169 7.513s 3.722us 11.88m 353.2us <- h1_wake_stream_for_recv@src/mux_h1.c:3623 tasklet_wakeup
h1_timeout_task 789116 338.6ms 429.0ns 1.272h 5.805ms <- h1_release@src/mux_h1.c:1382 task_wakeup
h1_io_cb 770425 1.684s 2.186us 1.123m 87.43us <- sock_conn_iocb@src/sock.c:1077 tasklet_wakeup
sc_conn_io_cb+0 770425 16.22s 21.05us 9.315m 725.4us <- h1_wake_stream_for_send@src/mux_h1.c:3633 tasklet_wakeup
sc_conn_io_cb+0 702309 3.146s 4.479us 7.220m 616.8us <- xprt_handshake_io_cb@src/xprt_handshake.c:92 tasklet_wakeup
ssl_sock_io_cb 437121 1.070h 8.811ms 17.29d 3.418s <- ssl_sock_start@src/ssl_sock.c:5008 tasklet_wakeup
process_stream+0 435748 2.102s 4.824us 8.702m 1.198ms <- run_tasks_from_lists@src/task.c:669 task_queue
accept_queue_process+0 266364 6.662s 25.01us 21.71m 4.891ms <- listener_accept@src/listener.c:1504 tasklet_wakeup
session_expire_embryonic+0 252787 8.420s 33.31us 4.854m 1.152ms <- session_accept_fd@src/session.c:344 task_queue
accept_queue_process+0 231692 4.568s 19.72us 25.87m 6.699ms <- accept_queue_process@src/listener.c:182 tasklet_wakeup
h1_io_cb 27884 67.83ms 2.432us 1.996m 4.295ms <- h1_takeover@src/mux_h1.c:5593 tasklet_wakeup
ssl_sock_io_cb 18820 16.46ms 874.0ns 7.891m 25.16ms <- conn_subscribe@src/connection.c:959 tasklet_wakeup
xprt_handshake_io_cb 13738 113.4ms 8.251us 1.423m 6.213ms <- sock_conn_iocb@src/sock.c:1098 tasklet_wakeup
process_stream+0 7246 16.36ms 2.258us 23.47s 3.239ms <- sc_app_chk_snd_conn@src/stconn.c:915 task_wakeup
process_chk 5203 443.9ms 85.31us 4.081s 784.3us <- wake_srv_chk@src/check.c:1091 task_wakeup
other 2242 1.310ms 584.0ns 2.237m 59.88ms <- session_accept_fd@src/session.c:344 task_queue
process_chk 2082 265.8ms 127.7us 1.926s 924.8us <- run_tasks_from_lists@src/task.c:669 task_queue
h1_io_cb 1229 1.144ms 930.0ns 7.987ms 6.499us <- h1_shutw_conn@src/mux_h1.c:4581 tasklet_wakeup
h1_io_cb 954 11.08ms 11.61us 310.7ms 325.7us <- conn_subscribe@src/connection.c:959 tasklet_wakeup
srv_chk_io_cb 516 1.421ms 2.753us 128.1ms 248.3us <- ssl_sock_io_cb@src/ssl_sock.c:5812 tasklet_wakeup
h1_timeout_task 387 33.74ms 87.18us 34.15ms 88.25us <- h1_init@src/mux_h1.c:1315 task_queue
ssl_sock_io_cb 282 7.771ms 27.56us 15.55ms 55.13us <- sock_conn_iocb@src/sock.c:1077 tasklet_wakeup
task_process_applet 137 142.5ms 1.040ms 70.21ms 512.5us <- sc_applet_create@src/stconn.c:573 appctx_wakeup
task_process_applet 98 221.5us 2.260us 80.54ms 821.8us <- sc_app_shut_applet@src/stconn.c:976 appctx_wakeup
srv_cleanup_idle_conns 61 189.1us 3.099us 17.04ms 279.4us <- run_tasks_from_lists@src/task.c:669 task_queue
srv_cleanup_toremove_conns 26 4.544ms 174.8us 63.95ms 2.459ms <- srv_cleanup_idle_conns@src/server.c:7351 task_wakeup
srv_cleanup_idle_conns 14 9.675us 691.0ns 4.785ms 341.8us <- srv_add_to_idle_list@src/server.c:7262 task_schedule
srv_cleanup_toremove_conns 2 52.55us 26.27us 20.36ms 10.18ms <- connect_server@src/backend.c:1890 task_wakeup
task_process_applet 1 636.1us 636.1us 109.8us 109.8us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup
Note: there were other runs of vegeta with ~100, ~300 and ~1000 rps before the 4 ~5000 runs I showed at the top.
Also, two series of
show info;show activity10 seconds apart during the test would be very helpful to spot problems.
On the same HAProxy 3.2.4 with OpenSSL1.
First run:
Name: HAProxy
Version: 3.2.4-98813a1
Release_date: 2025/08/13
Nbthread: 4
Nbproc: 1
Process_num: 1
Pid: 15
Uptime: 0d 0h44m00s
Uptime_sec: 2640
Memmax_MB: 0
PoolAlloc_MB: 30
PoolUsed_MB: 30
PoolFailed: 0
Ulimit-n: 100055
Maxsock: 100055
Maxconn: 50000
Hard_maxconn: 50000
CurrConns: 22962
CumConns: 1708051
CumReq: 3751932
MaxSslConns: 0
CurrSslConns: 21506
CumSslConns: 852340
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 1721
ConnRateLimit: 0
MaxConnRate: 5752
SessRate: 1717
SessRateLimit: 0
MaxSessRate: 5752
SslRate: 863
SslRateLimit: 0
MaxSslRate: 2876
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 3034
SslFrontendSessionReuse_pct: 100
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 10
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 23025
Run_queue: 20956
Idle_pct: 0
node: router-test-75ffc8746-gq4rn
Stopping: 0
Jobs: 22961
Unstoppable Jobs: 0
Listeners: 5
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 1211949
BusyPolling: 0
FailedResolutions: 0
TotalBytesOut: 4141035696
TotalSplicedBytesOut: 0
BytesOutRate: 1450304
DebugCommandsIssued: 0
CumRecvLogs: 0
Build info: 3.2.4-98813a1
Memmax_bytes: 0
PoolAlloc_bytes: 32075104
PoolUsed_bytes: 32075104
Start_time_sec: 1758286022
Tainted: 0
TotalWarnings: 86
MaxconnReached: 0
BootTime_ms: 10
Niced_tasks: 0
CurrStreams: 1443
CumStreams: 3751544
BlockedTrafficWarnings: 0
thread_id: 1 (1..4)
date_now: 1758288662.938673
uptime_now: 2640.368699
ctxsw: 46214508 [ 11700473 11610510 11548189 11355336 ]
tasksw: 10901458 [ 2754468 2738272 2725568 2683150 ]
empty_rq: 1561 [ 556 338 296 371 ]
long_rq: 268098 [ 67948 67541 66643 65966 ]
curr_rq: 20956 [ 5231 5250 5238 5237 ]
loops: 2733438 [ 685989 682538 687609 677302 ]
wake_tasks: 268192 [ 67975 67563 66663 65991 ]
wake_signal: 0 [ 0 0 0 0 ]
poll_io: 2711853 [ 680902 676583 682377 671991 ]
poll_exp: 0 [ 0 0 0 0 ]
poll_drop_fd: 10541 [ 2968 2414 2408 2751 ]
poll_skip_fd: 0 [ 0 0 0 0 ]
conn_dead: 0 [ 0 0 0 0 ]
stream_calls: 9769366 [ 2474046 2454321 2446205 2394794 ]
pool_fail: 0 [ 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 ]
cpust_ms_tot: 784627 [ 183893 197567 200842 202325 ]
cpust_ms_1s: 476 [ 112 159 101 104 ]
cpust_ms_15s: 8639 [ 1925 2441 1932 2341 ]
avg_cpu_pct: 100 [ 100 100 100 100 ]
avg_loop_us: 18774 [ 18381 19134 18342 19239 ]
accepted: 1699883 [ 426671 424509 426546 422157 ]
accq_pushed: 1699883 [ 424969 423786 427854 423274 ]
accq_full: 0 [ 0 0 0 0 ]
accq_ring: 8 [ 0 5 3 0 ]
fd_takeover: 33334 [ 8185 8558 8096 8495 ]
check_adopted: 40 [ 12 4 14 10 ]
check_started: 8176 [ 2166 1544 2351 2115 ]
check_active: 0 [ 0 0 0 0 ]
check_running: 0 [ 0 0 0 0 ]
~10 seconds later:
Name: HAProxy
Version: 3.2.4-98813a1
Release_date: 2025/08/13
Nbthread: 4
Nbproc: 1
Process_num: 1
Pid: 15
Uptime: 0d 0h44m10s
Uptime_sec: 2650
Memmax_MB: 0
PoolAlloc_MB: 35
PoolUsed_MB: 35
PoolFailed: 0
Ulimit-n: 100055
Maxsock: 100055
Maxconn: 50000
Hard_maxconn: 50000
CurrConns: 26812
CumConns: 1724469
CumReq: 3768869
MaxSslConns: 0
CurrSslConns: 25472
CumSslConns: 860550
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 1343
ConnRateLimit: 0
MaxConnRate: 5752
SessRate: 1308
SessRateLimit: 0
MaxSessRate: 5752
SslRate: 664
SslRateLimit: 0
MaxSslRate: 2876
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 3034
SslFrontendSessionReuse_pct: 100
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 10
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 26882
Run_queue: 25396
Idle_pct: 0
node: router-test-75ffc8746-gq4rn
Stopping: 0
Jobs: 26769
Unstoppable Jobs: 0
Listeners: 5
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 1216974
BusyPolling: 0
FailedResolutions: 0
TotalBytesOut: 4154929041
TotalSplicedBytesOut: 0
BytesOutRate: 1126336
DebugCommandsIssued: 0
CumRecvLogs: 0
Build info: 3.2.4-98813a1
Memmax_bytes: 0
PoolAlloc_bytes: 36929904
PoolUsed_bytes: 36929904
Start_time_sec: 1758286022
Tainted: 0
TotalWarnings: 86
MaxconnReached: 0
BootTime_ms: 10
Niced_tasks: 0
CurrStreams: 1294
CumStreams: 3768481
BlockedTrafficWarnings: 0
thread_id: 1 (1..4)
date_now: 1758288673.192806
uptime_now: 2650.622832
ctxsw: 46448259 [ 11759828 11665761 11606544 11416126 ]
tasksw: 10968993 [ 2771530 2754083 2742590 2700790 ]
empty_rq: 1561 [ 556 338 296 371 ]
long_rq: 270154 [ 68481 68032 67154 66487 ]
curr_rq: 25395 [ 6232 6467 6440 6256 ]
loops: 2735494 [ 686522 683029 688120 677823 ]
wake_tasks: 270248 [ 68508 68054 67174 66512 ]
wake_signal: 0 [ 0 0 0 0 ]
poll_io: 2713909 [ 681435 677074 682888 672512 ]
poll_exp: 0 [ 0 0 0 0 ]
poll_drop_fd: 10543 [ 2969 2414 2408 2752 ]
poll_skip_fd: 0 [ 0 0 0 0 ]
conn_dead: 0 [ 0 0 0 0 ]
stream_calls: 9828206 [ 2488924 2468054 2461102 2410126 ]
pool_fail: 0 [ 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 ]
cpust_ms_tot: 791536 [ 185463 199366 202732 203975 ]
cpust_ms_1s: 1013 [ 228 259 402 124 ]
cpust_ms_15s: 9639 [ 2103 2648 2444 2444 ]
avg_cpu_pct: 100 [ 100 100 100 100 ]
avg_loop_us: 19750 [ 18932 20648 19765 19654 ]
accepted: 1716316 [ 430931 428437 430631 426317 ]
accq_pushed: 1716316 [ 429115 427772 432007 427422 ]
accq_full: 0 [ 0 0 0 0 ]
accq_ring: 46 [ 0 46 0 0 ]
fd_takeover: 33392 [ 8195 8581 8109 8507 ]
check_adopted: 40 [ 12 4 14 10 ]
check_started: 8203 [ 2169 1547 2361 2126 ]
check_active: 0 [ 0 0 0 0 ]
check_running: 0 [ 0 0 0 0 ]
Very useful, thank you! So what we can see already:
- for 1.2M connections you're seeing around 10M wakeups of the connection iocb just from the receive callback, that's 8 wakeups per connection. It seems high to me but nothing terrifying.
- two wakeups of the SSL iocb on avg per connection seems reasonable
- the average latency of most tasks is huge (hundreds of ms), and it definitely explains why reverting the patch makes things go smoother: in that case only one SSL context will be handled per polling loop, so the rest is much more fluid. This also proves that the SSL wakeups are quite expensive, though not all of them given that the cpu_avg remains in the microseconds range.
- the average loop time is huge (almost 20 ms), it should stay in the microseconds to tens of microseconds range, at worst, 100+ us.
- the runqueue size is abnormal: 25k tasks, that's roughly 5 tasks per connection, and everything runs to completion with such a size
- the connection rate is quite low (664/s) and these are only resumed, not even rekeyed.
It behaves as if something wouldn't stop after a test, e.g. connections would fail and stay in the failed state forever, eating CPU. I suspect that after a run is complete, haproxy's CPU stays high. It really sounds like an incorrectly caught error. Unfortunately when these happen, they can be extremely sensitive to the timings and environment, so it may take a while before we manage to reproduce it. But we can also count on luck.
In any case, I can confirm that at such low rates, it shouldn't eat CPU like that at all!
BTW I remember in the past, we noticed that having large dhparams could seriously affect performance. I'm not seeing anything standing out here so it's hard to say, but if you have a 4096 dhparam in your certs file, it might be useful to try again with a lower one to see if it is in any way related to this or not.
Also the output of a perf top during a test could help as well.
Thank you for the swift reply!
I suspect that after a run is complete, haproxy's CPU stays high
We cannot confirm that. CPU usage drops back down once the test run completes. Correction: HAProxy does keep high CPU for about 30 seconds after the run completes. I apologize - I didn’t pay attention to the CPU immediately after the test run. Now that you pointed this out, I have double-checked and confirmed it.
but if you have a 4096 dhparam in your certs file, it might be useful to try again with a lower one to see if it is in any way related to this or not
The cert/key PEM files we used for the tests don’t contain a DH PARAMETERS block. We rely only on tune.ssl.default-dh-param, which is set to 2048 in the global section. This setting isn’t exposed to end users, so it should never exceed 2048.
Also the output of a perf top during a test could help as well.
We will try to gather this data.
Also the output of a perf top during a test could help as well.
HAProxy 3.2.4 with OpenSSL1 was tested using -max-workers 5000 -rate 0 --duration=1m:
38.08% haproxy libcrypto.so.1.1.1k [.] __bn_sqrx8x_reduction
21.29% haproxy libcrypto.so.1.1.1k [.] bn_sqrx8x_internal
19.89% haproxy libcrypto.so.1.1.1k [.] mulx4x_internal
1.75% haproxy [kernel.kallsyms] [k] masked_flow_lookup
1.56% haproxy libcrypto.so.1.1.1k [.] __bn_postx4x_internal
1.01% haproxy [kernel.kallsyms] [k] __do_softirq
0.60% haproxy libcrypto.so.1.1.1k [.] bn_mulx4x_mont
0.42% haproxy libcrypto.so.1.1.1k [.] fe_mul
0.38% haproxy [kernel.kallsyms] [k] __nf_conntrack_find_get
0.33% haproxy [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.30% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_sqr
0.25% haproxy libcrypto.so.1.1.1k [.] cmov
0.18% haproxy [kernel.kallsyms] [k] _raw_spin_lock
0.18% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_mul
0.16% haproxy libc-2.28.so [.] _int_malloc
0.16% haproxy libc-2.28.so [.] malloc
0.15% haproxy libcrypto.so.1.1.1k [.] OPENSSL_cleanse
0.15% haproxy [kernel.kallsyms] [k] nft_do_chain
0.15% haproxy [kernel.kallsyms] [k] ___slab_alloc
0.14% haproxy libcrypto.so.1.1.1k [.] fe_sq
0.14% haproxy [kernel.kallsyms] [k] finish_task_switch.isra.0
0.14% haproxy libcrypto.so.1.1.1k [.] bn_powerx5
0.13% haproxy [kernel.kallsyms] [k] kmem_cache_free
0.13% haproxy [kernel.kallsyms] [k] syscall_enter_from_user_mode
0.12% haproxy libcrypto.so.1.1.1k [.] bn_div_fixed_top
0.12% haproxy [kernel.kallsyms] [k] exit_to_user_mode_loop
0.11% haproxy libc-2.28.so [.] _int_free
0.11% haproxy libcrypto.so.1.1.1k [.] bn_mul_add_words
HAProxy 3.2.4 with OpenSSL1 was tested using -max-workers 5000 -rate 0 --duration=3m:
37.59% haproxy libcrypto.so.1.1.1k [.] __bn_sqrx8x_reduction
21.01% haproxy libcrypto.so.1.1.1k [.] bn_sqrx8x_internal
19.55% haproxy libcrypto.so.1.1.1k [.] mulx4x_internal
1.67% haproxy [kernel.kallsyms] [k] masked_flow_lookup
1.56% haproxy libcrypto.so.1.1.1k [.] __bn_postx4x_internal
1.01% haproxy [kernel.kallsyms] [k] __do_softirq
0.60% haproxy libcrypto.so.1.1.1k [.] bn_mulx4x_mont
0.44% haproxy [kernel.kallsyms] [k] __nf_conntrack_find_get
0.42% haproxy libcrypto.so.1.1.1k [.] fe_mul
0.35% haproxy [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.30% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_sqr
0.23% haproxy libcrypto.so.1.1.1k [.] cmov
0.23% haproxy libc-2.28.so [.] _int_malloc
0.21% haproxy [kernel.kallsyms] [k] _raw_spin_lock
0.18% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_mul
0.17% haproxy [kernel.kallsyms] [k] nft_do_chain
0.17% haproxy libc-2.28.so [.] malloc
0.16% haproxy [kernel.kallsyms] [k] ___slab_alloc
0.16% haproxy libcrypto.so.1.1.1k [.] OPENSSL_cleanse
0.15% haproxy [kernel.kallsyms] [k] kmem_cache_free
0.14% haproxy [kernel.kallsyms] [k] syscall_enter_from_user_mode
0.14% haproxy [kernel.kallsyms] [k] finish_task_switch.isra.0
0.14% haproxy libcrypto.so.1.1.1k [.] fe_sq
0.14% haproxy libcrypto.so.1.1.1k [.] bn_powerx5
0.13% haproxy [kernel.kallsyms] [k] __fget_light
0.13% haproxy libc-2.28.so [.] _int_free
0.12% haproxy haproxy [.] process_stream
0.12% haproxy [kernel.kallsyms] [k] exit_to_user_mode_loop
I hope I managed to resolve libcrypto symbols well. It was not quite straight forward from inside OpenShift.
It looks like most of the CPU time is spent on TLS handshakes. The bn (Big Number) and mulx4x prefixed functions should be part of the handshake implementation.
how is openssl configured and built ? I recall accidental installation from freebsd ports with "assembler optimization" disabled (not in purpose, just checked wrong checkbox), and due to that big numbers math took ~25% more cpu than it should
HAProxy 3.2.4 with OpenSSL1 but with the problematic commit (4c48edba4f45bb78f41af7d79d3c176710fe6a90) reverted gives 100% success rate with the same parameters (5000/1m, 5000/3m):
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=1m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 579218, 9651.16, 9645.89
Duration [total, attack, wait] 1m0s, 1m0s, 32.307ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.777ms, 299.099ms, 193.156ms, 323.798ms, 377.102ms, 695.592ms, 11.714s
Bytes In [total, mean] 31278052, 54.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:579213 408:5
Error Set:
408 Request Time-out
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 2011351, 11173.16, 11162.89
Duration [total, attack, wait] 3m0s, 3m0s, 165.03ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.769ms, 235.119ms, 199.623ms, 328.015ms, 367.911ms, 461.493ms, 12.239s
Bytes In [total, mean] 108613290, 54.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2011345 408:6
Error Set:
408 Request Time-out
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 2000227, 11108.64, 11098.97
Duration [total, attack, wait] 3m0s, 3m0s, 153.49ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.762ms, 238.002ms, 198.027ms, 328.518ms, 368.636ms, 461.825ms, 14.085s
Bytes In [total, mean] 108014274, 54.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:2000191 408:36
Error Set:
408 Request Time-out
And shows the following perf top output:
5.66% haproxy [kernel.kallsyms] [k] masked_flow_lookup
5.53% haproxy libcrypto.so.1.1.1k [.] __bn_sqrx8x_reduction
4.02% haproxy [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
3.10% haproxy libcrypto.so.1.1.1k [.] bn_sqrx8x_internal
2.88% haproxy libcrypto.so.1.1.1k [.] mulx4x_internal
2.78% haproxy [kernel.kallsyms] [k] __do_softirq
1.37% haproxy [kernel.kallsyms] [k] finish_task_switch.isra.0
1.36% haproxy [kernel.kallsyms] [k] syscall_enter_from_user_mode
1.33% haproxy [kernel.kallsyms] [k] __nf_conntrack_find_get
1.09% haproxy haproxy [.] sess_build_logline_orig
1.08% haproxy [kernel.kallsyms] [k] nft_do_chain
0.88% haproxy haproxy [.] h1_headers_to_hdr_list
0.86% haproxy [kernel.kallsyms] [k] __fget_light
0.84% haproxy haproxy [.] sock_conn_iocb
0.83% haproxy [kernel.kallsyms] [k] _raw_spin_lock
0.76% haproxy haproxy [.] h1_snd_buf
0.76% haproxy haproxy [.] process_stream
0.71% haproxy haproxy [.] sc_conn_recv
0.58% haproxy haproxy [.] run_tasks_from_lists
0.56% haproxy haproxy [.] __pool_alloc
0.56% haproxy haproxy [.] __http_find_header
0.54% haproxy [kernel.kallsyms] [k] do_epoll_ctl
0.54% haproxy libc-2.28.so [.] vfprintf
0.49% haproxy [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.48% haproxy [kernel.kallsyms] [k] avc_lookup
0.45% haproxy haproxy [.] h1_process_demux
0.43% haproxy haproxy [.] _do_poll
0.42% haproxy haproxy [.] sc_notify
OK thanks for the measurements. Now it's getting pretty clear to me what is happening: for a reason we still don't know, the SSL lib is not completely done with asymmetric crypto when we remove the flag so it occupies all the CPU, causing extreme latencies that make requests timeout. In this case the load generator has no other option but disconnecting and reconnecting, further increasing the asymmetric crypto cost, and making the situation even worse.
I'll see what I can propose to postpone the removal of that flag. I think we could do it once we receive the first byte of data, which should definitely confirm that the crypto is complete. I'll check how to do this.
OK thanks for the measurements. Now it's getting pretty clear to me what is happening: for a reason we still don't know, the SSL lib is not completely done with asymmetric crypto when we remove the flag so it occupies all the CPU
That could be something like the computation of a new TLS ticket which is sent after the handshake.
Are the tests only in TLS 1.3 ? I'm wondering if we don't have differences with TLS1.2 on post-handshakes.
That's why I was asking about that, we maybe have made the tests in TLS 1.2 when doing the mentioned patch, if this is related to this.
Are the tests only in TLS 1.3 ? I'm wondering if we don't have differences with TLS1.2 on post-handshakes.
Most likely, yes. vegeta does not provide a flag to set max TLS version, so golang's crypto/tls#Config.MinVersion and MaxVersion are used which are TLS 1.2 and TLS 1.3 respectively. Our HAProxy selects TLS 1.3 ciphersuite which should result into TLS 1.3. I didn't do a pcap on this. If you would like to confirm that I can do it.
I'll see what I can propose to postpone the removal of that flag. I think we could do it once we receive the first byte of data, which should definitely confirm that the crypto is complete. I'll check how to do this.
Let us know if any testing is needed. We would be happy to give a try to any patch which tries to fix this issue.
If you could try the following patch, that would be helpful. It only drops the HEAVY flag once bytes are received over the socket, indicating that we don't need expensive operations anymore.
@alebedev87 I'm trying to reproduce. I'm puzzled because you have "accept-proxy" on your bind line, so vegeta cannot attack it this way. Did you disable it or are you passing through another component ? Also, the "redacted" ciphers do not help to reproduce the issue, could you please at least indicate which one suffices to reproduce the problem ? You could for example log the ciphers in haproxy and place the equivalent ones in the conf.
I'm facing difficulties with vegeta, it closes the source ports in a dirty way and leaves tons of TIME_WAIT sockets, which quickly end up eating all the source ports on the machine until we cannot connect anymore. I guess I'll have to try it on another machine :-(
I'm trying to reproduce. I'm puzzled because you have "accept-proxy" on your bind line, so vegeta cannot attack it this way. Did you disable it or are you passing through another component ?
We haven’t disabled accept-proxy. The tests are running with the standard OpenShift HAProxy configuration. The traffic is routed through an AWS load balancer, which is the default setup to publish the OpenShift router.
You could for example log the ciphers in haproxy and place the equivalent ones in the conf.
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
If you could try the following patch, that would be helpful. It only drops the HEAVY flag once bytes are received over the socket, indicating that we don't need expensive operations anymore. postpone-heavy.diff.txt
We applied this patch on HAProxy 3.2.4/OpenSSL1:
- CPU behavior looks similar to the version without the patch.
- Same about
perf top vegetaresults got better, the success rate never went below90%(UPD: latencies are high and request rates are low")
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=1m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 39830, 663.77, 518.14
Duration [total, attack, wait] 1m11s, 1m0s, 10.835s
Latencies [min, mean, 50, 90, 95, 99, max] 4.572ms, 8.002s, 8.325s, 9.38s, 19.873s, 24.449s, 30.009s
Bytes In [total, mean] 2311410, 58.03
Bytes Out [total, mean] 0, 0.00
Success [ratio] 92.15%
Status Codes [code:count] 0:131 200:36705 408:2994
Error Set:
408 Request Time-out
Get "https://hello-world.com/": dial tcp 0.0.0.0:0->10.0.75.141:443: connect: connection refused
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 99907, 555.04, 508.92
Duration [total, attack, wait] 3m9s, 3m0s, 9.443s
Latencies [min, mean, 50, 90, 95, 99, max] 1.676s, 9.236s, 9.803s, 10.374s, 10.5s, 22.223s, 30.003s
Bytes In [total, mean] 5574364, 55.80
Bytes Out [total, mean] 0, 0.00
Success [ratio] 96.50%
Status Codes [code:count] 0:149 200:96411 408:3347
Error Set:
408 Request Time-out
Get "https://hello-world.com/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 97160, 539.76, 498.42
Duration [total, attack, wait] 3m7s, 3m0s, 6.587s
Latencies [min, mean, 50, 90, 95, 99, max] 23.63ms, 9.431s, 8.454s, 11.49s, 25.328s, 30.725s, 31.365s
Bytes In [total, mean] 5041908, 51.89
Bytes Out [total, mean] 0, 0.00
Success [ratio] 95.72%
Status Codes [code:count] 0:3978 200:93002 408:180
Error Set:
Get "https://hello-world.com/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Get "https://hello-world.com/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 114057, 633.59, 588.50
Duration [total, attack, wait] 3m8s, 3m0s, 7.637s
Latencies [min, mean, 50, 90, 95, 99, max] 52.896ms, 8.068s, 8.092s, 9.816s, 12.267s, 30.605s, 31.62s
Bytes In [total, mean] 5968660, 52.33
Bytes Out [total, mean] 0, 0.00
Success [ratio] 96.82%
Status Codes [code:count] 0:3575 200:110435 408:47
Error Set:
Get "https://hello-world.com/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Get "https://hello-world.com/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 87248, 484.69, 410.80
Duration [total, attack, wait] 3m22s, 3m0s, 22.244s
Latencies [min, mean, 50, 90, 95, 99, max] 2.034s, 10.692s, 10.696s, 12.115s, 13.412s, 30s, 30.009s
Bytes In [total, mean] 4810210, 55.13
Bytes Out [total, mean] 0, 0.00
Success [ratio] 95.23%
Status Codes [code:count] 0:1221 200:83085 408:2942
Error Set:
408 Request Time-out
Get "https://hello-world.com/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
$ echo "GET https://hello-world.com/" | vegeta attack -max-workers 5000 -rate 0 -keepalive=true --duration=3m -insecure | tee results.bin | vegeta report | head
Requests [total, rate, throughput] 159508, 886.14, 841.66
Duration [total, attack, wait] 3m5s, 3m0s, 5.42s
Latencies [min, mean, 50, 90, 95, 99, max] 1.539ms, 5.71s, 5.761s, 6.504s, 6.681s, 20.496s, 30.005s
Bytes In [total, mean] 8753662, 54.88
Bytes Out [total, mean] 0, 0.00
Success [ratio] 97.84%
Status Codes [code:count] 0:479 200:156063 408:2966
Error Set:
408 Request Time-out
Get "https://hello-world.com/": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
perf top:
40.15% haproxy libcrypto.so.1.1.1k [.] __bn_sqrx8x_reduction
22.43% haproxy libcrypto.so.1.1.1k [.] bn_sqrx8x_internal
20.89% haproxy libcrypto.so.1.1.1k [.] mulx4x_internal
1.66% haproxy libcrypto.so.1.1.1k [.] __bn_postx4x_internal
0.82% haproxy [kernel.kallsyms] [k] masked_flow_lookup
0.78% haproxy [kernel.kallsyms] [k] __do_softirq
0.64% haproxy libcrypto.so.1.1.1k [.] bn_mulx4x_mont
0.44% haproxy libcrypto.so.1.1.1k [.] fe_mul
0.32% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_sqr
0.28% haproxy libc-2.28.so [.] _int_malloc
0.25% haproxy libcrypto.so.1.1.1k [.] cmov
0.23% haproxy [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.19% haproxy libcrypto.so.1.1.1k [.] x25519_fe64_mul
0.18% haproxy [kernel.kallsyms] [k] __nf_conntrack_find_get
0.16% haproxy libcrypto.so.1.1.1k [.] OPENSSL_cleanse
0.15% haproxy libcrypto.so.1.1.1k [.] fe_sq
0.15% haproxy libc-2.28.so [.] malloc
0.14% haproxy libcrypto.so.1.1.1k [.] bn_powerx5
0.13% haproxy [kernel.kallsyms] [k] _raw_spin_lock
0.13% haproxy libcrypto.so.1.1.1k [.] bn_div_fixed_top
0.13% haproxy [kernel.kallsyms] [k] finish_task_switch.isra.0
0.12% haproxy libcrypto.so.1.1.1k [.] bn_mul_add_words
0.12% haproxy libc-2.28.so [.] _int_free
0.11% haproxy [kernel.kallsyms] [k] nft_do_chain
0.11% haproxy [kernel.kallsyms] [k] exit_to_user_mode_loop
0.10% haproxy [kernel.kallsyms] [k] syscall_enter_from_user_mode
We applied this patch on HAProxy 3.4.2/OpenSSL1:
I'm pretty sure this is a typo and it's HAProxy 3.2.4, right.
Interesting that the latency statistics reported by vegata are much lower when the original patch is reverted, yet the new test patch shows similar latency as seen in the original patch. Moreover, new suggested patch provides better success rates. That is surprising to me.
Thanks @alebedev87 that really helps. Makes sense for accept-proxy, I thought you were injecting directly to haproxy itself. We'll retry with your settings. @rohara I'm not drawing any conclusion on this. The fact is that as long as too many handshakes are in the queue, latencies and timeouts will occur, and timeouts will cause reconnections hence new handshakes hence higher CPU usage etc. It's just spiraling down towards the bad situation that the very first patch introducing TASK_HEAVY meant to address. @wlallemand suggested something interesting regarding server-initiated tickets that would be emitted after the request is received, meaning that haproxy here would still have to perform some crypto operations at this moment and that this could explain this observation. If you can easily run a quick tests with "no-tls-tickets" to disable them, it could help confirm/reject this hypothesis. That doesn't mean that it suddenly gives us an easy solution though, but it helps narrow it down. The main difficulty I'm seeing is to know whether or not the TLS layer still needs to perform slow crypto or not. Because as you see we can't afford to release the flag too early, but we cannot afford to keep it forever either, it really ruins TLS performance.
Also I'm wondering if we're really witnessing crypto on send here. Indeed, unless we've saturated the client's TCP window with tons of data, there's no reason to pass via the scheduler to send data. Normally it should be synchronous from the stream layer. So here I'm suspecting we're having to do that upon a certain Rx operation that we need to identify.