haproxy
haproxy copied to clipboard
Large NIC ring buffer sizes cause HAProxy v2.7.10 to crash under load
Detailed Description of the Problem
Hi,
I tried to apply some NIC optimizations to all physical interfaces:
# systemd-networkd config section
CombinedChannels=64
RxBufferSize=2047
TxBufferSize=2047
UseAdaptiveRxCoalesce=true
UseAdaptiveTxCoalesce=true
The driver defaults are as follows:
# systemd-networkd config section (mocked)
CombinedChannels=8
RxBufferSize=511
TxBufferSize=511
UseAdaptiveRxCoalesce=false
UseAdaptiveTxCoalesce=false
After I sent traffic to HAProxy in L4 mode it crashed around 10k TCP connections. Log attached below.
So I started to dial everything back to find the change causing the crash. I started with RxBufferSizeand TxBufferSize and set both back to 511. This was immediately successful.
I guess it must be somehow related to the ring buffer sizes since it was the only thing I reverted back to driver defaults.
The interfaces I applied these options to were as follows:
BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (uplink)
BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (internal bond active)
BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (internal bond passive)
Expected Behavior
HAProxy should not crash.
Steps to Reproduce the Behavior
- Set rx/tx ring buffer sizes of all NICs to 2047.
- Put HAProxy under load.
Do you have any idea what may have caused this?
Large ring buffer size. Maybe NIC driver.
Do you have an idea how to solve the issue?
No response
What is your configuration?
Can be provided in sanitized form if needed.
Output of haproxy -vv
HAProxy version 2.7.10-1ppa1~focal 2023/08/12 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2024.
Known bugs: http://www.haproxy.org/bugs/bugs-2.7.10.html
Running on: Linux 5.4.0-167-generic #184-Ubuntu SMP Tue Oct 31 09:21:49 UTC 2023 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-DDO7EW/haproxy-2.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION -QUIC +RT +SHM_OPEN +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=64).
Built with OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : prometheus-exporter
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
Thread 46 is about to kill the process.
Thread 1 : id=0x7fab4f139180 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=0 rqsz=1
1/1 stuck=0 prof=0 harmless=1 isolated=0
cpu_ns: poll=35140437435 now=35740140085 diff=599702650
curr_task=0
Thread 2 : id=0x7fab4f12d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=5 rqsz=52
1/2 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30085310795 now=30713903365 diff=628592570
curr_task=0x7fa98c1c49a0 (task) calls=15 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fab48056290
strm=0x7fab48056290,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8103
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca021 rqa=0 rpf=8004a020 rpa=0
scf=0x7fab48081c10,CLO,20 scb=0x7fab4808b0e0,EST,21
af=(nil),0 sab=(nil),0
cof=0x7fa9741d06a0,a00c0300:PASS(0x7fab4805a310)/RAW((nil))/tcpv4(10491)
cob=0x7fa98c2548d0,100c0000:PASS(0x7fab480f0d80)/NONE((nil))/NONE(-1)
Thread 3 : id=0x7fab4e7a7700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=3
1/3 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=31560009414 now=32657403563 diff=1097394149
curr_task=0
Thread 4 : id=0x7fab4dfa6700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=153
1/4 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32567643198 now=33829999067 diff=1262355869
curr_task=0x7faa640a62e0 (tasklet) calls=2
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa640b9880
strm=0x7faa640594f0,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb01:8100
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=848002 rqa=0 rpf=80048202 rpa=0
scf=0x7faa640b9880,EST,0 scb=0x7faa640d0f60,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa9741950b0,a0000300:PASS(0x7faa64155eb0)/RAW((nil))/tcpv4(37525)
cob=0x7fa9741f90e0,10000300:PASS(0x7faa640ac2d0)/RAW((nil))/tcpv4(40685)
Thread 5 : id=0x7fab4d7a5700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
1/5 stuck=0 prof=0 harmless=1 isolated=0
cpu_ns: poll=34043386228 now=34596818328 diff=553432100
curr_task=0
Thread 6 : id=0x7fab4cfa4700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=0 rqsz=1
1/6 stuck=0 prof=0 harmless=1 isolated=0
cpu_ns: poll=29185808183 now=29658429703 diff=472621520
curr_task=0
Thread 7 : id=0x7faaf48bd700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
1/7 stuck=0 prof=0 harmless=1 isolated=0
cpu_ns: poll=36374454479 now=36810005418 diff=435550939
curr_task=0
Thread 8 : id=0x7faaf1f5a700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=34
1/8 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=29933935253 now=30722250921 diff=788315668
curr_task=0
Thread 9 : id=0x7faaf1759700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=3
2/1 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32072977736 now=32719832415 diff=646854679
curr_task=0
Thread 10: id=0x7faaf0f58700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=12
2/2 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=27665068891 now=29265876719 diff=1600807828
curr_task=0
Thread 11: id=0x7faaf0757700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=77
2/3 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=31182937756 now=31823297834 diff=640360078
curr_task=0x7faa5c48b870 (tasklet) calls=1
fct=0x557cae528830(srv_chk_io_cb) ctx=0x7faa5c48b8c0
Thread 12: id=0x7faaeff56700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=11 rqsz=74
2/4 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28615865661 now=29144149971 diff=528284310
curr_task=0x557cb0bfec30 (task) calls=1256 last=0
fct=0x557cae529ce0(process_chk) ctx=0x557cb09acf80
Thread 13: id=0x7faaef755700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=97
2/5 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30477909361 now=31382894900 diff=904985539
curr_task=0x7faac4064400 (tasklet) calls=10
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faac40543c0
strm=0x7faac40be950,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb01:8114
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7faac40543c0,EST,0 scb=0x7faac4063400,EST,1
af=(nil),0 sab=(nil),0
cof=0x7faa5c465730,a0000300:PASS(0x7faac4037bd0)/RAW((nil))/tcpv4(4115)
cob=0x7faa5c368420,10000300:PASS(0x7faac403dc80)/RAW((nil))/tcpv4(7394)
Thread 14: id=0x7faaeef54700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=2 rqsz=34
2/6 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32847947901 now=33520932409 diff=672984508
curr_task=0x7faab40cb170 (tasklet) calls=3
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faab42430a0
strm=0x7faab4298b00,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8119
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=848002 rqa=0 rpf=80048202 rpa=0
scf=0x7faab42430a0,EST,0 scb=0x7faab4072ce0,EST,1
af=(nil),0 sab=(nil),0
cof=0x7faab417f140,a0000300:PASS(0x7faab40c4dc0)/RAW((nil))/tcpv4(38734)
cob=0x7faa5c03d070,10000300:PASS(0x7faab423d700)/RAW((nil))/tcpv4(39338)
>Thread 15: id=0x7faaee753700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
2/7 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=32502566741 now=34983581329 diff=2481014588
curr_task=0x7faa54053000 (tasklet) calls=2
fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7faa540cafd0
call trace(14):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
| 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
| 0x557cae50427f [f6 43 06 10 ba ff ff ff]: conn_ctrl_send+0x7f/0xc8
| 0x557cae5070e7 [85 c0 78 1d 66 01 43 02]: conn_send_proxy+0x107/0x37a
| 0x557cae5affde [85 c0 0f 84 ba 02 00 00]: xprt_handshake_io_cb+0x9e/0x409
Thread 16: id=0x7faaedf52700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=138
2/8 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=35151012969 now=36314425558 diff=1163412589
curr_task=0x7fa954061500 (tasklet) calls=2
fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7fa9540489f0
Thread 17: id=0x7faaed751700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=1 rqsz=9
3/1 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=29967065088 now=30623641327 diff=656576239
curr_task=0x557cb0be83c0 (tasklet) calls=3002
fct=0x557cae5458f0(accept_queue_process) ctx=0x557cae6e2040
>Thread 18: id=0x7faaecf50700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=133
3/2 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=31436308086 now=34630164853 diff=3193856767
curr_task=0x7fa994183360 (tasklet) calls=12
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa9940f3c50
strm=0x7fa9940f3860,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7fa9940f3c50,EST,0 scb=0x7fa994198230,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa9943d1550,a0000300:PASS(0x7fa99418fd60)/RAW((nil))/tcpv4(29982)
cob=0x7fa99443f290,10000300:PASS(0x7fa9940fa3d0)/RAW((nil))/tcpv4(29983)
call trace(19):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x557cae5be4e0 [f3 0f 1e fa 8b 46 08 83]: wdt_handler
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
| 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
| 0x557cae580a5f [48 85 c0 74 1f 48 8b 4b]: main+0x1adc9f
| 0x557cae533045 [85 c0 0f 8e 53 fe ff ff]: main+0x160285
| 0x557cae5352ef [48 8b 75 20 0f b6 4d 01]: main+0x16252f
| 0x557cae5342e3 [48 8b 55 30 48 8b 4d 28]: main+0x161523
| 0x557cae53505b [48 8b 7b 28 e8 dc 2f f4]: main+0x16229b
| 0x557cae53523e [eb ad 48 89 f7 e8 88 db]: sc_conn_io_cb+0x7e/0x92
Thread 19: id=0x7faaec74f700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=266
3/3 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=26895645961 now=28178590141 diff=1282944180
curr_task=0x7fa9a41d1ef0 (task) calls=14 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9a406b760
strm=0x7fa9a406b760,4204b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8109
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca024 rqa=0 rpf=8004a223 rpa=0
scf=0x7fa9a4117160,CLO,20 scb=0x7fa9a40759e0,EST,21
af=(nil),0 sab=(nil),0
cof=0x7fa994263ce0,a0080300:PASS(0x7fa9a413dc00)/RAW((nil))/tcpv4(28320)
cob=0x7fa9a41f2850,100c0000:PASS(0x7fa9a4054d00)/NONE((nil))/NONE(-1)
Thread 20: id=0x7faaebf4e700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=19
3/4 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28657507329 now=29411699549 diff=754192220
curr_task=0x7faa24087e90 (tasklet) calls=22
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa240e2e30
strm=0x7faa240e2a40,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8112
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
3/5 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30583900834 now=31454772223 diff=870871389
curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908002 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
3/6 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32649294376 now=33307612305 diff=658317929
curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908000 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
Thread 23: id=0x7faaea74b700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=7
3/7 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28531106942 now=29114540051 diff=583433109
curr_task=0x7fa92c046fc0 (tasklet) calls=12
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa92c041b80
strm=0x7fa92c048a60,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8104
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7fa92c041b80,EST,0 scb=0x7fa92c0a39e0,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa99443c3f0,a0000300:PASS(0x7fa92c0412a0)/RAW((nil))/tcpv4(19024)
cob=0x7fa9942dd6b0,10000300:PASS(0x7fa92c06f600)/RAW((nil))/tcpv4(22071)
>Thread 24: id=0x7faae9f4a700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=77
3/8 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=32609451867 now=34764522744 diff=2155070877
curr_task=0x7fa9a40d0290 (task) calls=4 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa978144660
strm=0x7fa978144660,4204a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca024 rqa=0 rpf=8004a020 rpa=0
scf=0x7fa9781445f0,CLO,20 scb=0x7fa978144aa0,EST,21
af=(nil),0 sab=(nil),0
cof=0x7fa9943741b0,a0080300:PASS(0x7fa9781430a0)/RAW((nil))/tcpv4(38498)
cob=0x7fa994350600,100c0000:PASS(0x7fa978176110)/NONE((nil))/NONE(-1)
call trace(15):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb0445b [48 3d 00 f0 ff ff 77 2f]: libpthread:__close+0x3b/0x85
| 0x557cae591fa0 [eb a5 66 66 2e 0f 1f 84]: _fd_delete_orphan+0x1d0/0x1d2
| 0x557cae58961a [c7 43 68 fd ad de fd 5b]: sock_conn_ctrl_close+0x1a/0x23
| 0x557cae580fee [e9 43 ff ff ff 0f 1f 44]: main+0x1ae22e
| 0x557cae5356aa [48 8b 43 20 81 48 18 00]: main+0x1628ea
| 0x557cae47bb39 [41 8b 47 20 e9 ee f2 ff]: process_stream+0x1239/0x3688
Thread 25: id=0x7faae9749700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=51
4/1 stuck=0 prof=0 harmless=0 isolated=0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
3/5 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30583900834 now=31454772223 diff=870871389
curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908002 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
3/6 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32649294376 now=33307612305 diff=658317929
curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908000 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
rqf=848002 rqa=0 rpf=80048202 rpa=0
scf=0x7faab42430a0,EST,0 scb=0x7faab4072ce0,EST,1
af=(nil),0 sab=(nil),0
cof=0x7faab417f140,a0000300:PASS(0x7faab40c4dc0)/RAW((nil))/tcpv4(38734)
cob=0x7faa5c03d070,10000300:PASS(0x7faab423d700)/RAW((nil))/tcpv4(39338)
>Thread 15: id=0x7faaee753700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
2/7 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=32502566741 now=34983581329 diff=2481014588
curr_task=0x7faa54053000 (tasklet) calls=2
fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7faa540cafd0
call trace(14):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
| 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
| 0x557cae50427f [f6 43 06 10 ba ff ff ff]: conn_ctrl_send+0x7f/0xc8
| 0x557cae5070e7 [85 c0 78 1d 66 01 43 02]: conn_send_proxy+0x107/0x37a
| 0x557cae5affde [85 c0 0f 84 ba 02 00 00]: xprt_handshake_io_cb+0x9e/0x409
Thread 16: id=0x7faaedf52700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=138
2/8 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=35151012969 now=36314425558 diff=1163412589
curr_task=0x7fa954061500 (tasklet) calls=2
fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7fa9540489f0
Thread 17: id=0x7faaed751700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=1 rqsz=9
3/1 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=29967065088 now=30623641327 diff=656576239
curr_task=0x557cb0be83c0 (tasklet) calls=3002
fct=0x557cae5458f0(accept_queue_process) ctx=0x557cae6e2040
>Thread 18: id=0x7faaecf50700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=133
3/2 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=31436308086 now=34630164853 diff=3193856767
curr_task=0x7fa994183360 (tasklet) calls=12
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa9940f3c50
strm=0x7fa9940f3860,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7fa9940f3c50,EST,0 scb=0x7fa994198230,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa9943d1550,a0000300:PASS(0x7fa99418fd60)/RAW((nil))/tcpv4(29982)
cob=0x7fa99443f290,10000300:PASS(0x7fa9940fa3d0)/RAW((nil))/tcpv4(29983)
call trace(19):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x557cae5be4e0 [f3 0f 1e fa 8b 46 08 83]: wdt_handler
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
| 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
| 0x557cae580a5f [48 85 c0 74 1f 48 8b 4b]: main+0x1adc9f
| 0x557cae533045 [85 c0 0f 8e 53 fe ff ff]: main+0x160285
| 0x557cae5352ef [48 8b 75 20 0f b6 4d 01]: main+0x16252f
| 0x557cae5342e3 [48 8b 55 30 48 8b 4d 28]: main+0x161523
| 0x557cae53505b [48 8b 7b 28 e8 dc 2f f4]: main+0x16229b
| 0x557cae53523e [eb ad 48 89 f7 e8 88 db]: sc_conn_io_cb+0x7e/0x92
Thread 19: id=0x7faaec74f700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=266
3/3 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=26895645961 now=28178590141 diff=1282944180
curr_task=0x7fa9a41d1ef0 (task) calls=14 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9a406b760
strm=0x7fa9a406b760,4204b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8109
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca024 rqa=0 rpf=8004a223 rpa=0
scf=0x7fa9a4117160,CLO,20 scb=0x7fa9a40759e0,EST,21
af=(nil),0 sab=(nil),0
cof=0x7fa994263ce0,a0080300:PASS(0x7fa9a413dc00)/RAW((nil))/tcpv4(28320)
cob=0x7fa9a41f2850,100c0000:PASS(0x7fa9a4054d00)/NONE((nil))/NONE(-1)
Thread 20: id=0x7faaebf4e700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=19
3/4 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28657507329 now=29411699549 diff=754192220
curr_task=0x7faa24087e90 (tasklet) calls=22
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa240e2e30
strm=0x7faa240e2a40,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8112
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
3/5 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30583900834 now=31454772223 diff=870871389
curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908002 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
3/6 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32649294376 now=33307612305 diff=658317929
curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=908000 rqa=0 rpf=80000000 rpa=0
scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
af=(nil),0 sab=(nil),0
cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
Thread 23: id=0x7faaea74b700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=7
3/7 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28531106942 now=29114540051 diff=583433109
curr_task=0x7fa92c046fc0 (tasklet) calls=12
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa92c041b80
strm=0x7fa92c048a60,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8104
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8c8002 rqa=0 rpf=80048000 rpa=0
scf=0x7fa92c041b80,EST,0 scb=0x7fa92c0a39e0,EST,1
af=(nil),0 sab=(nil),0
cof=0x7fa99443c3f0,a0000300:PASS(0x7fa92c0412a0)/RAW((nil))/tcpv4(19024)
cob=0x7fa9942dd6b0,10000300:PASS(0x7fa92c06f600)/RAW((nil))/tcpv4(22071)
>Thread 24: id=0x7faae9f4a700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=77
3/8 stuck=1 prof=0 harmless=0 isolated=0
cpu_ns: poll=32609451867 now=34764522744 diff=2155070877
curr_task=0x7fa9a40d0290 (task) calls=4 last=0
fct=0x557cae47a900(process_stream) ctx=0x7fa978144660
strm=0x7fa978144660,4204a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca024 rqa=0 rpf=8004a020 rpa=0
scf=0x7fa9781445f0,CLO,20 scb=0x7fa978144aa0,EST,21
af=(nil),0 sab=(nil),0
cof=0x7fa9943741b0,a0080300:PASS(0x7fa9781430a0)/RAW((nil))/tcpv4(38498)
cob=0x7fa994350600,100c0000:PASS(0x7fa978176110)/NONE((nil))/NONE(-1)
call trace(15):
| 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
| 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
| 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
| 0x7fab4fb0445b [48 3d 00 f0 ff ff 77 2f]: libpthread:__close+0x3b/0x85
| 0x557cae591fa0 [eb a5 66 66 2e 0f 1f 84]: _fd_delete_orphan+0x1d0/0x1d2
| 0x557cae58961a [c7 43 68 fd ad de fd 5b]: sock_conn_ctrl_close+0x1a/0x23
| 0x557cae580fee [e9 43 ff ff ff 0f 1f 44]: main+0x1ae22e
| 0x557cae5356aa [48 8b 43 20 81 48 18 00]: main+0x1628ea
| 0x557cae47bb39 [41 8b 47 20 e9 ee f2 ff]: process_stream+0x1239/0x3688
Thread 25: id=0x7faae9749700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=51
4/1 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=30309885213 now=31008851052 diff=698965839
curr_task=0x557cb0bfc9d0 (task) calls=1264 last=0
fct=0x557cae529ce0(process_chk) ctx=0x557cb0993f80
Thread 26: id=0x7faae8f48700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
4/2 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32076695282 now=33325980701 diff=1249285419
curr_task=0
Thread 27: id=0x7faae8747700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=3 rqsz=44
4/3 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=31268976611 now=32583177200 diff=1314200589
curr_task=0x7fa96c10de60 (task) calls=15 last=0
fct=0x557cae47a900(process_stream) ctx=0x7faaac046e40
strm=0x7faaac046e40,b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8105
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=8ca021 rqa=0 rpf=8004a020 rpa=0
scf=0x7faaac043c80,CLO,0 scb=0x7faaac06c680,CLO,1
af=(nil),0 sab=(nil),0
cof=0x7fa9b4430930,a00c0300:PASS(0x7faaac06c880)/RAW((nil))/tcpv4(5982)
cob=0x7fa9b406e180,100c0000:PASS(0x7faaac09d810)/NONE((nil))/NONE(-1)
Thread 28: id=0x7faae7f46700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=93
4/4 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=28984450573 now=29526614353 diff=542163780
curr_task=0x7faa9c0a6530 (tasklet) calls=2
fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa9c0c2150
Thread 29: id=0x7faae7745700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=53
4/5 stuck=0 prof=0 harmless=0 isolated=0
cpu_ns: poll=32700955417 now=33600629605 diff=899674188
curr_task=0x7fa9801581b0 (tasklet) calls=5
[NOTICE] (6438) : haproxy version is 2.7.10-1ppa1~focal
[NOTICE] (6438) : path to executable is /usr/sbin/haproxy
[ALERT] (6438) : Current worker (6445) exited with code 134 (Aborted)
[ALERT] (6438) : exit-on-failure: killing every processes with SIGTERM
[WARNING] (6438) : All workers exited. Exiting... (134)
Additional Information
No response
Hi Philipp,
the gdb backtrace of this thread18 would be useful. It's probably the one called "1" in gdb, though not sure. In the worst case, "t a a bt" will show them all. Seeing multiple libpthread entries with send and close doesn't please me much, it makes me think there could be some contention at the libc level maybe because a non-blocking send() is blocked in the system for some reason that should never happen, while holding a lock.
Do you think your system might be occasionally running short of memory ? In my experience, larger rings tend to degrade performance more than they improve. They're necessary when running the network stack in userland, but when you're using the kernel that's not needed at all and maybe it's more difficult for the system to allocate buffers for incoming packets. But blocking a syscall for 3 seconds is extremely shocking, it makes me think the machine is looking for memory everywhere :-/
Do you think your system might be occasionally running short of memory ?
Unlikely. The system has 64GB memory and uses < 15 GB. The metrics during the incident confirm that.
Sadly the coredumps are truncated. I need to tweak settings to make another one.
Is there any news on this issue ?
Hi, I could not reproduce that in a test environment and I can not deliberately crash a productive instance. From my point of view you can close the issue since I can just make do with a lower NIC buffer size. The underlying issue (whatever it is) is probably not fixed though.
OK, closing then, thanks Philipp.