haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

HAProxy crashes with Segmentation fault err

Open mieszko4 opened this issue 8 months ago • 19 comments

Detailed Description of the Problem

HAProxy crashes with Segmentation fault err after executing Runtime API commands through socat.

Expected Behavior

HAProxy should not crash.

Steps to Reproduce the Behavior

  1. Execute one-by-one add server NEW_SERVER weight 10 maxconn 100000 on-marked-down shutdown-sessions check;set NEW_SERVER state ready;enable health NEW_SERVER 10 times
  2. Have 1000 active WebSocket connections
  3. Execute one-by-one add server NEW_SERVER weight 10 maxconn 100000 on-marked-down shutdown-sessions check;set NEW_SERVER state ready;enable health NEW_SERVER 10 times. This is the same as step (1)
  4. Execute one-by-one set server OLD_SERVER state drain 10 times
  5. Execute one-by-one set server OLD_SERVER state maint;del server OLD_SERVER 10 times

HAProxy crashes during step (5) after 5th execution.

Do you have any idea what may have caused this?

The crash does not happen all the time. It tends to happen when there are more active WebSocket connections.

This looks like it it related to bug fix BUG/MEDIUM: server/cli: don't delete a dynamic server that has streams which was applied in 2.9-dev6 but not applied in version which I am using.

However, in this bug description it says:

Indeed, when the server option "on-marked-down shutdown-sessions" is not used, server streams are not purged when srv enters maintenance mode.

But I am adding dynamic server with on-marked-down shutdown-sessions so I am not sure if that bug fix applies.

It looks to me that set server OLD_SERVER state maint does not purge all server streams all the time. And if that happens, then calling del server OLD_SERVER crashes the server.

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
    log stdout format raw local0
    nbthread 1

    stats socket ipv4@*:9901 level admin

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s

    log global
    option httplog

frontend sonic_alb
    bind *:9902

    http-request set-var(txn.room_id,ifnotexists) url_param(token),word(2,.),ub64dec,json_query('$.room_id','int')
    
    acl is_root path eq /
    acl has_room_id var(txn.room_id) -m int gt 0

    http-request reject unless is_root has_room_id
    default_backend sonic_workers

backend sonic_workers
    balance roundrobin

    stick on var(txn.room_id)

    stick-table type integer size 456m srvkey name

    option httpchk
    http-check send meth GET uri /health

Output of haproxy -vv

HAProxy version 2.8.3-86e043a 2023/09/07 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.3.html
Running on: Linux 6.2.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Oct  5 22:43:45 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_PROMEX=1 USE_PCRE2=1 USE_PCRE2_JIT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION -QUIC +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=2).
Built with OpenSSL version : OpenSSL 1.1.1w  11 Sep 2023
Running on OpenSSL version : OpenSSL 1.1.1w  11 Sep 2023
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

2023-10-24 10:50:37.539+00	172.44.XX.YYY:59067 [24/Oct/2023:10:50:01.086] sonic_alb sonic_workers/i-0d142a1496ec5b010 0/0/1/1/36227 101 10480 - - ---- 890/890/887/40/0 0/0 "GET /?token=TOKEN&isAccepted=1&appState=active&clientVersion=4.13.2 HTTP/1.1"			
2023-10-24 10:50:37.539+00	172.44.XX.YYY:46619 [24/Oct/2023:10:43:28.740] sonic_alb sonic_workers/i-0cb99edf08d14b2f1 0/0/1/2/428622 101 138613 - - ---- 888/888/886/104/0 0/0 "GET /?token=TOKEN&isAccepted=1&appState=active&clientVersion=4.13.2 HTTP/1.1"		
2023-10-24 10:50:38.540+00	[NOTICE]   (1) : haproxy version is 2.8.3-86e043a		
2023-10-24 10:50:38.540+00	[NOTICE]   (1) : path to executable is /usr/local/sbin/haproxy		
2023-10-24 10:50:38.540+00	[ALERT]    (1) : Current worker (8) exited with code 139 (Segmentation fault)		
2023-10-24 10:50:38.540+00	[ALERT]    (1) : exit-on-failure: killing every processes with SIGTERM		
2023-10-24 10:50:38.540+00	[WARNING]  (1) : All workers exited. Exiting... (139)

Additional Information

HAProxy is run in docker: Docker version 24.0.6, build ed223bc Docker is run on EC2: Ubuntu 22.04

mieszko4 avatar Oct 25 '23 00:10 mieszko4