haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

Large NIC ring buffer sizes cause HAProxy v2.7.10 to crash under load

Open phihos opened this issue 2 years ago • 2 comments

Detailed Description of the Problem

Hi,

I tried to apply some NIC optimizations to all physical interfaces:

# systemd-networkd config section
CombinedChannels=64
RxBufferSize=2047
TxBufferSize=2047
UseAdaptiveRxCoalesce=true
UseAdaptiveTxCoalesce=true

The driver defaults are as follows:

# systemd-networkd config section (mocked)
CombinedChannels=8
RxBufferSize=511
TxBufferSize=511
UseAdaptiveRxCoalesce=false
UseAdaptiveTxCoalesce=false

After I sent traffic to HAProxy in L4 mode it crashed around 10k TCP connections. Log attached below. So I started to dial everything back to find the change causing the crash. I started with RxBufferSizeand TxBufferSize and set both back to 511. This was immediately successful. I guess it must be somehow related to the ring buffer sizes since it was the only thing I reverted back to driver defaults.

The interfaces I applied these options to were as follows:

BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (uplink)
BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (internal bond active)
BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (internal bond passive)

Expected Behavior

HAProxy should not crash.

Steps to Reproduce the Behavior

  1. Set rx/tx ring buffer sizes of all NICs to 2047.
  2. Put HAProxy under load.

Do you have any idea what may have caused this?

Large ring buffer size. Maybe NIC driver.

Do you have an idea how to solve the issue?

No response

What is your configuration?

Can be provided in sanitized form if needed.

Output of haproxy -vv

HAProxy version 2.7.10-1ppa1~focal 2023/08/12 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2024.
Known bugs: http://www.haproxy.org/bugs/bugs-2.7.10.html
Running on: Linux 5.4.0-167-generic #184-Ubuntu SMP Tue Oct 31 09:21:49 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-DDO7EW/haproxy-2.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION -QUIC +RT +SHM_OPEN +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=64).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

Thread 46 is about to kill the process.
  Thread 1 : id=0x7fab4f139180 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=0 rqsz=1
      1/1    stuck=0 prof=0 harmless=1 isolated=0
             cpu_ns: poll=35140437435 now=35740140085 diff=599702650
             curr_task=0
  Thread 2 : id=0x7fab4f12d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=5 rqsz=52
      1/2    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30085310795 now=30713903365 diff=628592570
             curr_task=0x7fa98c1c49a0 (task) calls=15 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fab48056290
             strm=0x7fab48056290,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8103
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca021 rqa=0 rpf=8004a020 rpa=0
             scf=0x7fab48081c10,CLO,20 scb=0x7fab4808b0e0,EST,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9741d06a0,a00c0300:PASS(0x7fab4805a310)/RAW((nil))/tcpv4(10491)
             cob=0x7fa98c2548d0,100c0000:PASS(0x7fab480f0d80)/NONE((nil))/NONE(-1)
  Thread 3 : id=0x7fab4e7a7700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=3
      1/3    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=31560009414 now=32657403563 diff=1097394149
             curr_task=0
  Thread 4 : id=0x7fab4dfa6700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=153
      1/4    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32567643198 now=33829999067 diff=1262355869
             curr_task=0x7faa640a62e0 (tasklet) calls=2
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa640b9880
             strm=0x7faa640594f0,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb01:8100
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=848002 rqa=0 rpf=80048202 rpa=0
             scf=0x7faa640b9880,EST,0 scb=0x7faa640d0f60,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa9741950b0,a0000300:PASS(0x7faa64155eb0)/RAW((nil))/tcpv4(37525)
             cob=0x7fa9741f90e0,10000300:PASS(0x7faa640ac2d0)/RAW((nil))/tcpv4(40685)
  Thread 5 : id=0x7fab4d7a5700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/5    stuck=0 prof=0 harmless=1 isolated=0
             cpu_ns: poll=34043386228 now=34596818328 diff=553432100
             curr_task=0
  Thread 6 : id=0x7fab4cfa4700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=0 rqsz=1
      1/6    stuck=0 prof=0 harmless=1 isolated=0
             cpu_ns: poll=29185808183 now=29658429703 diff=472621520
             curr_task=0
  Thread 7 : id=0x7faaf48bd700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/7    stuck=0 prof=0 harmless=1 isolated=0
             cpu_ns: poll=36374454479 now=36810005418 diff=435550939
             curr_task=0
  Thread 8 : id=0x7faaf1f5a700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=34
      1/8    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=29933935253 now=30722250921 diff=788315668
             curr_task=0
  Thread 9 : id=0x7faaf1759700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=3
      2/1    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32072977736 now=32719832415 diff=646854679
             curr_task=0
  Thread 10: id=0x7faaf0f58700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=12
      2/2    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=27665068891 now=29265876719 diff=1600807828
             curr_task=0
  Thread 11: id=0x7faaf0757700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=77
      2/3    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=31182937756 now=31823297834 diff=640360078
             curr_task=0x7faa5c48b870 (tasklet) calls=1
               fct=0x557cae528830(srv_chk_io_cb) ctx=0x7faa5c48b8c0
  Thread 12: id=0x7faaeff56700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=11 rqsz=74
      2/4    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28615865661 now=29144149971 diff=528284310
             curr_task=0x557cb0bfec30 (task) calls=1256 last=0
               fct=0x557cae529ce0(process_chk) ctx=0x557cb09acf80
  Thread 13: id=0x7faaef755700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=97
      2/5    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30477909361 now=31382894900 diff=904985539
             curr_task=0x7faac4064400 (tasklet) calls=10
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faac40543c0
             strm=0x7faac40be950,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb01:8114
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7faac40543c0,EST,0 scb=0x7faac4063400,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7faa5c465730,a0000300:PASS(0x7faac4037bd0)/RAW((nil))/tcpv4(4115)
             cob=0x7faa5c368420,10000300:PASS(0x7faac403dc80)/RAW((nil))/tcpv4(7394)
  Thread 14: id=0x7faaeef54700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=2 rqsz=34
      2/6    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32847947901 now=33520932409 diff=672984508
             curr_task=0x7faab40cb170 (tasklet) calls=3
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faab42430a0
             strm=0x7faab4298b00,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8119
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=848002 rqa=0 rpf=80048202 rpa=0
             scf=0x7faab42430a0,EST,0 scb=0x7faab4072ce0,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7faab417f140,a0000300:PASS(0x7faab40c4dc0)/RAW((nil))/tcpv4(38734)
             cob=0x7faa5c03d070,10000300:PASS(0x7faab423d700)/RAW((nil))/tcpv4(39338)
 >Thread 15: id=0x7faaee753700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
      2/7    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32502566741 now=34983581329 diff=2481014588
             curr_task=0x7faa54053000 (tasklet) calls=2
               fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7faa540cafd0
             call trace(14):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
             | 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
             | 0x557cae50427f [f6 43 06 10 ba ff ff ff]: conn_ctrl_send+0x7f/0xc8
             | 0x557cae5070e7 [85 c0 78 1d 66 01 43 02]: conn_send_proxy+0x107/0x37a
             | 0x557cae5affde [85 c0 0f 84 ba 02 00 00]: xprt_handshake_io_cb+0x9e/0x409
  Thread 16: id=0x7faaedf52700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=138
      2/8    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=35151012969 now=36314425558 diff=1163412589
             curr_task=0x7fa954061500 (tasklet) calls=2
               fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7fa9540489f0
  Thread 17: id=0x7faaed751700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=1 rqsz=9
      3/1    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=29967065088 now=30623641327 diff=656576239
             curr_task=0x557cb0be83c0 (tasklet) calls=3002
               fct=0x557cae5458f0(accept_queue_process) ctx=0x557cae6e2040
 >Thread 18: id=0x7faaecf50700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=133
      3/2    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=31436308086 now=34630164853 diff=3193856767
             curr_task=0x7fa994183360 (tasklet) calls=12
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa9940f3c50
             strm=0x7fa9940f3860,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7fa9940f3c50,EST,0 scb=0x7fa994198230,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa9943d1550,a0000300:PASS(0x7fa99418fd60)/RAW((nil))/tcpv4(29982)
             cob=0x7fa99443f290,10000300:PASS(0x7fa9940fa3d0)/RAW((nil))/tcpv4(29983)
             call trace(19):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x557cae5be4e0 [f3 0f 1e fa 8b 46 08 83]: wdt_handler
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
             | 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
             | 0x557cae580a5f [48 85 c0 74 1f 48 8b 4b]: main+0x1adc9f
             | 0x557cae533045 [85 c0 0f 8e 53 fe ff ff]: main+0x160285
             | 0x557cae5352ef [48 8b 75 20 0f b6 4d 01]: main+0x16252f
             | 0x557cae5342e3 [48 8b 55 30 48 8b 4d 28]: main+0x161523
             | 0x557cae53505b [48 8b 7b 28 e8 dc 2f f4]: main+0x16229b
             | 0x557cae53523e [eb ad 48 89 f7 e8 88 db]: sc_conn_io_cb+0x7e/0x92
  Thread 19: id=0x7faaec74f700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=266
      3/3    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=26895645961 now=28178590141 diff=1282944180
             curr_task=0x7fa9a41d1ef0 (task) calls=14 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9a406b760
             strm=0x7fa9a406b760,4204b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8109
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca024 rqa=0 rpf=8004a223 rpa=0
             scf=0x7fa9a4117160,CLO,20 scb=0x7fa9a40759e0,EST,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa994263ce0,a0080300:PASS(0x7fa9a413dc00)/RAW((nil))/tcpv4(28320)
             cob=0x7fa9a41f2850,100c0000:PASS(0x7fa9a4054d00)/NONE((nil))/NONE(-1)
  Thread 20: id=0x7faaebf4e700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=19
      3/4    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28657507329 now=29411699549 diff=754192220
             curr_task=0x7faa24087e90 (tasklet) calls=22
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa240e2e30
             strm=0x7faa240e2a40,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8112
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
             cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
  Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
      3/5    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30583900834 now=31454772223 diff=870871389
             curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
             strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908002 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
             cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
  Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
      3/6    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32649294376 now=33307612305 diff=658317929
             curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
             strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908000 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
             cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
  Thread 23: id=0x7faaea74b700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=7
      3/7    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28531106942 now=29114540051 diff=583433109
             curr_task=0x7fa92c046fc0 (tasklet) calls=12
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa92c041b80
             strm=0x7fa92c048a60,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8104
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7fa92c041b80,EST,0 scb=0x7fa92c0a39e0,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa99443c3f0,a0000300:PASS(0x7fa92c0412a0)/RAW((nil))/tcpv4(19024)
             cob=0x7fa9942dd6b0,10000300:PASS(0x7fa92c06f600)/RAW((nil))/tcpv4(22071)
 >Thread 24: id=0x7faae9f4a700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=77
      3/8    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32609451867 now=34764522744 diff=2155070877
             curr_task=0x7fa9a40d0290 (task) calls=4 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa978144660
             strm=0x7fa978144660,4204a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca024 rqa=0 rpf=8004a020 rpa=0
             scf=0x7fa9781445f0,CLO,20 scb=0x7fa978144aa0,EST,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9943741b0,a0080300:PASS(0x7fa9781430a0)/RAW((nil))/tcpv4(38498)
             cob=0x7fa994350600,100c0000:PASS(0x7fa978176110)/NONE((nil))/NONE(-1)
             call trace(15):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb0445b [48 3d 00 f0 ff ff 77 2f]: libpthread:__close+0x3b/0x85
             | 0x557cae591fa0 [eb a5 66 66 2e 0f 1f 84]: _fd_delete_orphan+0x1d0/0x1d2
             | 0x557cae58961a [c7 43 68 fd ad de fd 5b]: sock_conn_ctrl_close+0x1a/0x23
             | 0x557cae580fee [e9 43 ff ff ff 0f 1f 44]: main+0x1ae22e
             | 0x557cae5356aa [48 8b 43 20 81 48 18 00]: main+0x1628ea
             | 0x557cae47bb39 [41 8b 47 20 e9 ee f2 ff]: process_stream+0x1239/0x3688
  Thread 25: id=0x7faae9749700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=51
      4/1    stuck=0 prof=0 harmless=0 isolated=0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
             cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
  Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
      3/5    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30583900834 now=31454772223 diff=870871389
             curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
             strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908002 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
             cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
  Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
      3/6    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32649294376 now=33307612305 diff=658317929
             curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
             strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908000 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
             cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
             rqf=848002 rqa=0 rpf=80048202 rpa=0
             scf=0x7faab42430a0,EST,0 scb=0x7faab4072ce0,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7faab417f140,a0000300:PASS(0x7faab40c4dc0)/RAW((nil))/tcpv4(38734)
             cob=0x7faa5c03d070,10000300:PASS(0x7faab423d700)/RAW((nil))/tcpv4(39338)
 >Thread 15: id=0x7faaee753700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
      2/7    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32502566741 now=34983581329 diff=2481014588
             curr_task=0x7faa54053000 (tasklet) calls=2
               fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7faa540cafd0
             call trace(14):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
             | 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
             | 0x557cae50427f [f6 43 06 10 ba ff ff ff]: conn_ctrl_send+0x7f/0xc8
             | 0x557cae5070e7 [85 c0 78 1d 66 01 43 02]: conn_send_proxy+0x107/0x37a
             | 0x557cae5affde [85 c0 0f 84 ba 02 00 00]: xprt_handshake_io_cb+0x9e/0x409
  Thread 16: id=0x7faaedf52700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=138
      2/8    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=35151012969 now=36314425558 diff=1163412589
             curr_task=0x7fa954061500 (tasklet) calls=2
               fct=0x557cae5aff40(xprt_handshake_io_cb) ctx=0x7fa9540489f0
  Thread 17: id=0x7faaed751700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=1 rqsz=9
      3/1    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=29967065088 now=30623641327 diff=656576239
             curr_task=0x557cb0be83c0 (tasklet) calls=3002
               fct=0x557cae5458f0(accept_queue_process) ctx=0x557cae6e2040
 >Thread 18: id=0x7faaecf50700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=133
      3/2    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=31436308086 now=34630164853 diff=3193856767
             curr_task=0x7fa994183360 (tasklet) calls=12
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa9940f3c50
             strm=0x7fa9940f3860,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7fa9940f3c50,EST,0 scb=0x7fa994198230,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa9943d1550,a0000300:PASS(0x7fa99418fd60)/RAW((nil))/tcpv4(29982)
             cob=0x7fa99443f290,10000300:PASS(0x7fa9940fa3d0)/RAW((nil))/tcpv4(29983)
             call trace(19):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x557cae5be4e0 [f3 0f 1e fa 8b 46 08 83]: wdt_handler
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb047dc [48 3d 00 f0 ff ff 77 30]: libpthread:send+0x6c/0xb9
             | 0x557cae5a45d7 [48 85 c0 7e 5c 49 29 c5]: main+0x1d1817
             | 0x557cae580a5f [48 85 c0 74 1f 48 8b 4b]: main+0x1adc9f
             | 0x557cae533045 [85 c0 0f 8e 53 fe ff ff]: main+0x160285
             | 0x557cae5352ef [48 8b 75 20 0f b6 4d 01]: main+0x16252f
             | 0x557cae5342e3 [48 8b 55 30 48 8b 4d 28]: main+0x161523
             | 0x557cae53505b [48 8b 7b 28 e8 dc 2f f4]: main+0x16229b
             | 0x557cae53523e [eb ad 48 89 f7 e8 88 db]: sc_conn_io_cb+0x7e/0x92
  Thread 19: id=0x7faaec74f700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=266
      3/3    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=26895645961 now=28178590141 diff=1282944180
             curr_task=0x7fa9a41d1ef0 (task) calls=14 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9a406b760
             strm=0x7fa9a406b760,4204b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8109
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca024 rqa=0 rpf=8004a223 rpa=0
             scf=0x7fa9a4117160,CLO,20 scb=0x7fa9a40759e0,EST,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa994263ce0,a0080300:PASS(0x7fa9a413dc00)/RAW((nil))/tcpv4(28320)
             cob=0x7fa9a41f2850,100c0000:PASS(0x7fa9a4054d00)/NONE((nil))/NONE(-1)
  Thread 20: id=0x7faaebf4e700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=19
      3/4    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28657507329 now=29411699549 diff=754192220
             curr_task=0x7faa24087e90 (tasklet) calls=22
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa240e2e30
             strm=0x7faa240e2a40,4b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8112
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7faa240e2e30,EST,0 scb=0x7faa2409e780,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa994453900,a0000300:PASS(0x7faa240e35a0)/RAW((nil))/tcpv4(23843)
             cob=0x7fa99430cf70,10000300:PASS(0x7faa24067200)/RAW((nil))/tcpv4(30866)
  Thread 21: id=0x7faaeb74d700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=95
      3/5    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30583900834 now=31454772223 diff=870871389
             curr_task=0x7fa9ec1cfe90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9ec08fc60
             strm=0x7fa9ec08fc60,a src=xxx.xxx.xxx.xxx fe=https-no-proxy be=https-no-proxy dst=l7-lb03:8306
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908002 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9ec051430,EST,20 scb=0x7fa9ec090050,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9a41e5fa0,80000300:PASS(0x7fa9ec13a1e0)/RAW((nil))/tcpv4(42871)
             cob=0x7fa994413370,10400000:NONE(0x7fa9ec090050)/RAW((nil))/NONE(-1)
  Thread 22: id=0x7faaeaf4c700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=26
      3/6    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32649294376 now=33307612305 diff=658317929
             curr_task=0x7fa9e41a9c90 (task) calls=1 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa9e41a98a0
             strm=0x7fa9e41a98a0,a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8108
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=908000 rqa=0 rpf=80000000 rpa=0
             scf=0x7fa9e41a9830,EST,20 scb=0x7fa9e41a9d90,ASS,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9942fee90,a0000300:PASS(0x7fa9e4132d70)/RAW((nil))/tcpv4(26735)
             cob=0x7fa9e41b1d30,11400000:NONE(0x7fa9e41a9d90)/RAW((nil))/NONE(-1)
  Thread 23: id=0x7faaea74b700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=7
      3/7    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28531106942 now=29114540051 diff=583433109
             curr_task=0x7fa92c046fc0 (tasklet) calls=12
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7fa92c041b80
             strm=0x7fa92c048a60,4a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8104
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8c8002 rqa=0 rpf=80048000 rpa=0
             scf=0x7fa92c041b80,EST,0 scb=0x7fa92c0a39e0,EST,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa99443c3f0,a0000300:PASS(0x7fa92c0412a0)/RAW((nil))/tcpv4(19024)
             cob=0x7fa9942dd6b0,10000300:PASS(0x7fa92c06f600)/RAW((nil))/tcpv4(22071)
 >Thread 24: id=0x7faae9f4a700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=77
      3/8    stuck=1 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32609451867 now=34764522744 diff=2155070877
             curr_task=0x7fa9a40d0290 (task) calls=4 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7fa978144660
             strm=0x7fa978144660,4204a src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb03:8113
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca024 rqa=0 rpf=8004a020 rpa=0
             scf=0x7fa9781445f0,CLO,20 scb=0x7fa978144aa0,EST,21
             af=(nil),0 sab=(nil),0
             cof=0x7fa9943741b0,a0080300:PASS(0x7fa9781430a0)/RAW((nil))/tcpv4(38498)
             cob=0x7fa994350600,100c0000:PASS(0x7fa978176110)/NONE((nil))/NONE(-1)
             call trace(15):
             | 0x557cae55198f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x557cae5524b8 [64 8b 75 00 8b 0d ca b9]: debug_handler+0x118/0x27b
             | 0x7fab4fb05420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fab4fb0445b [48 3d 00 f0 ff ff 77 2f]: libpthread:__close+0x3b/0x85
             | 0x557cae591fa0 [eb a5 66 66 2e 0f 1f 84]: _fd_delete_orphan+0x1d0/0x1d2
             | 0x557cae58961a [c7 43 68 fd ad de fd 5b]: sock_conn_ctrl_close+0x1a/0x23
             | 0x557cae580fee [e9 43 ff ff ff 0f 1f 44]: main+0x1ae22e
             | 0x557cae5356aa [48 8b 43 20 81 48 18 00]: main+0x1628ea
             | 0x557cae47bb39 [41 8b 47 20 e9 ee f2 ff]: process_stream+0x1239/0x3688
  Thread 25: id=0x7faae9749700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=2 rqsz=51
      4/1    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=30309885213 now=31008851052 diff=698965839
             curr_task=0x557cb0bfc9d0 (task) calls=1264 last=0
               fct=0x557cae529ce0(process_chk) ctx=0x557cb0993f80
  Thread 26: id=0x7faae8f48700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
      4/2    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32076695282 now=33325980701 diff=1249285419
             curr_task=0
  Thread 27: id=0x7faae8747700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=3 rqsz=44
      4/3    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=31268976611 now=32583177200 diff=1314200589
             curr_task=0x7fa96c10de60 (task) calls=15 last=0
               fct=0x557cae47a900(process_stream) ctx=0x7faaac046e40
             strm=0x7faaac046e40,b src=xxx.xxx.xxx.xxx fe=https be=https dst=l7-lb02:8105
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=8ca021 rqa=0 rpf=8004a020 rpa=0
             scf=0x7faaac043c80,CLO,0 scb=0x7faaac06c680,CLO,1
             af=(nil),0 sab=(nil),0
             cof=0x7fa9b4430930,a00c0300:PASS(0x7faaac06c880)/RAW((nil))/tcpv4(5982)
             cob=0x7fa9b406e180,100c0000:PASS(0x7faaac09d810)/NONE((nil))/NONE(-1)
  Thread 28: id=0x7faae7f46700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=93
      4/4    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=28984450573 now=29526614353 diff=542163780
             curr_task=0x7faa9c0a6530 (tasklet) calls=2
               fct=0x557cae5351c0(sc_conn_io_cb) ctx=0x7faa9c0c2150
  Thread 29: id=0x7faae7745700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=53
      4/5    stuck=0 prof=0 harmless=0 isolated=0
             cpu_ns: poll=32700955417 now=33600629605 diff=899674188
             curr_task=0x7fa9801581b0 (tasklet) calls=5
[NOTICE]   (6438) : haproxy version is 2.7.10-1ppa1~focal
[NOTICE]   (6438) : path to executable is /usr/sbin/haproxy
[ALERT]    (6438) : Current worker (6445) exited with code 134 (Aborted)
[ALERT]    (6438) : exit-on-failure: killing every processes with SIGTERM
[WARNING]  (6438) : All workers exited. Exiting... (134)

Additional Information

No response

phihos avatar Nov 28 '23 19:11 phihos

Hi Philipp,

the gdb backtrace of this thread18 would be useful. It's probably the one called "1" in gdb, though not sure. In the worst case, "t a a bt" will show them all. Seeing multiple libpthread entries with send and close doesn't please me much, it makes me think there could be some contention at the libc level maybe because a non-blocking send() is blocked in the system for some reason that should never happen, while holding a lock.

Do you think your system might be occasionally running short of memory ? In my experience, larger rings tend to degrade performance more than they improve. They're necessary when running the network stack in userland, but when you're using the kernel that's not needed at all and maybe it's more difficult for the system to allocate buffers for incoming packets. But blocking a syscall for 3 seconds is extremely shocking, it makes me think the machine is looking for memory everywhere :-/

wtarreau avatar Nov 29 '23 09:11 wtarreau

Do you think your system might be occasionally running short of memory ?

Unlikely. The system has 64GB memory and uses < 15 GB. The metrics during the incident confirm that.

Sadly the coredumps are truncated. I need to tweak settings to make another one.

phihos avatar Nov 29 '23 18:11 phihos

Is there any news on this issue ?

capflam avatar Feb 29 '24 15:02 capflam

Hi, I could not reproduce that in a test environment and I can not deliberately crash a productive instance. From my point of view you can close the issue since I can just make do with a lower NIC buffer size. The underlying issue (whatever it is) is probably not fixed though.

phihos avatar Mar 03 '24 21:03 phihos

OK, closing then, thanks Philipp.

wtarreau avatar Mar 09 '24 15:03 wtarreau