haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

HAProxy 2.9.4 Randomly Crashes

Open JB0925 opened this issue 1 year ago • 6 comments

Detailed Description of the Problem

We have recently tried to upgrade to HAProxy 2.9.4. Upon doing so, we noticed that HAProxy randomly crashed roughly once per day. We have not noticed this behavior in the version we are upgrading from, which is 2.8.0.

Expected Behavior

HAProxy should not crash.

Steps to Reproduce the Behavior

We cannot reproduce the behavior.

Do you have any idea what may have caused this?

The HAProxy version 2.9.4 has only been deployed on two hosts in our staging environment. These hosts do not receive any traffic, except some clients sending health check requests. To serve each health check request, HAProxy calls spoa to verify if spoa is up or not. From the stacktrace, it seems that some cleanup related to spoa is being done, but we are not exactly sure what could be causing the crash or if it is related.

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
    log ring@log_buffer len 65535 local3
    stats timeout 2m

    close-spread-time 10m

    hard-stop-after 11m

    no-memory-trimming

    set-dumpable
    strict-limits
    maxconn 1000000

    master-worker
    nbthread 64
    
    chroot ./chroot_empty
    user nobody
    group nobody
       
       
    stats socket ./haproxy.sock mode 770 user app group app level admin

    program my_spoa
    command /app/sidecar /app/sidecar.config


ring log_buffer
    description "Buffer for logs"

    server s1 127.0.0.1:2222

    format rfc3164
    maxlen 65535
    size 2048000000
    timeout connect 1s
    timeout server 1s
defaults
    timeout connect 66s
    timeout client 65s
    timeout server 60s
    mode http

    option httpchk GET /admin

    default-server alpn h2,http/1.1

    no option logasap
    log global

resolvers systemdns
    accepted_payload_size 65535
    parse-resolv-conf
    timeout resolve 3s
       
       

frontend stats
    bind ipv4@:5555,ipv6@:5555
    stats enable
    stats uri /stats
    stats refresh 10s


frontend public
    bind ipv4@:4444,ipv6@:4444
    bind ipv4@:6789,ipv6@:6789 accept-proxy
    bind ipv4@:7777,ipv6@:7777 ssl crt /app/certs alpn http/1.1
    bind ipv4@:8888,ipv6@:8888 accept-proxy ssl crt /app/certs alpn http/1.1
    option forwardfor

    filter spoe engine primary_spoa config /app/my_spoa.conf

    acl is_admin_endpoint path -i -m str /admin

    acl is_req_from_localhost src 127.0.0.1

    use_backend admin-healthcheck if is_admin_endpoint
       
    default_backend DefaultBackend


backend DefaultBackend
   server server1 127.0.0.1:2345


backend admin-healthcheck
    
    filter spoe engine my_spoa config /app/my_spoa.conf
    http-request send-spoe-group my_spoa adminhc
    http-request return status 404 content-type text/plain string BAD if !{ var(txn.my_spoa.adminhc.status) -m str GOOD }
       
    http-request return status 200 content-type text/plain string GOOD if { str(status),map(/app/hc.map) -i -m str GOOD }
    http-request return status 404 content-type text/plain string BAD

Output of haproxy -vv

HAProxy version 2.9.4-efc6b7-1 2024/01/31 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.9.4.html
Running on: Linux 5.15.125.1-2.cm2 #1 SMP Fri Aug 11 02:17:58 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=64).
Built with OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.45 2021-06-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-44)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

We do not see any error messages in our logs. Below is the output of `bt full`. Please see the attached file in the next section for the full output of `t a a bt full`.



warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing
[New LWP 3102656]
[New LWP 3102618]
[New LWP 3102619]
[New LWP 3102614]
[New LWP 3102613]
[New LWP 3102643]
[New LWP 3102623]
[New LWP 3102622]
[New LWP 3102615]
[New LWP 3102624]
[New LWP 3102625]
[New LWP 3102626]
[New LWP 3102621]
[New LWP 3102627]
[New LWP 3102628]
[New LWP 3102631]
[New LWP 3102629]
[New LWP 3102620]
[New LWP 3102617]
[New LWP 3102632]
[New LWP 3102633]
[New LWP 3102634]
[New LWP 3102635]
[New LWP 3102637]
[New LWP 3102630]
[New LWP 3102638]
[New LWP 3102640]
[New LWP 3102641]
[New LWP 3102644]
[New LWP 3102645]
[New LWP 3102646]
[New LWP 3102648]
[New LWP 3102647]
[New LWP 3102650]
[New LWP 3102649]
[New LWP 3102651]
[New LWP 3102658]
[New LWP 3102639]
[New LWP 3102653]
[New LWP 3102662]
[New LWP 3102659]
[New LWP 3102654]
[New LWP 3102667]
[New LWP 3102657]
[New LWP 3102655]
[New LWP 3102661]
[New LWP 3102669]
[New LWP 3102670]
[New LWP 3102671]
[New LWP 3102675]
[New LWP 3102672]
[New LWP 3102676]
[New LWP 3102666]
[New LWP 3102616]
[New LWP 3102664]
[New LWP 3102660]
[New LWP 3102674]
[New LWP 3102673]
[New LWP 3102665]
[New LWP 3102663]
[New LWP 3102668]
[New LWP 3102652]
[New LWP 3102642]
[New LWP 3102636]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Program terminated with signal SIGABRT, Aborted.
#0  0x00007bac4e185e6c in __pthread_kill_implementation () from /lib/libc.so.6
[Current thread is 1 (Thread 0x7baba5520640 (LWP 3102656))]
(gdb) bt full
#0  0x00007bac4e185e6c in __pthread_kill_implementation () from /lib/libc.so.6
No symbol table info available.
#1  0x00007bac4e13ad86 in raise () from /lib/libc.so.6
No symbol table info available.
#2  0x00007bac4e1257e5 in abort () from /lib/libc.so.6
No symbol table info available.
#3  0x00000000005b43cd in ha_panic () at src/debug.c:620
        old = 0x0
        thr = <optimized out>
#4  <signal handler called>
No symbol table info available.
#5  pl_wait_new_long (lock=lock@entry=0x22b3a80, prev=prev@entry=1)
    at include/import/plock.h:216
        loops = 202 '\312'
        m = <optimized out>
        curr = 203
#6  0x00000000005d63c1 in pool_flush (pool=0x22b37c0) at src/pool.c:765
        next = <optimized out>
        temp = <optimized out>
        down = <optimized out>
        bucket = <optimized out>
#7  0x000000000051fe12 in __task_free (t=<optimized out>)
    at include/haproxy/task.h:628
    __ptr = <optimized out>
#8  task_destroy (t=0x7baa9005d380) at include/haproxy/task.h:651
No locals.
#9  spoe_release_appctx (appctx=<optimized out>) at src/flt_spoe.c:1300
        spoe_appctx = 0x7baa9005d080
        agent = 0x11bcdabf0
        ctx = <optimized out>
        back = <optimized out>
        __func__ = "spoe_release_appctx"
#10 0x000000000060f1d1 in appctx_shut (appctx=0x7baa9005d160)
    at src/applet.c:361
        __FUNCTION__ = "appctx_shut"
#11 0x000000000059272c in sc_app_shut_applet (sc=0x7baa9005d500)
    at src/stconn.c:931
        ic = <optimized out>
        oc = <optimized out>
        __func__ = "sc_app_shut_applet"
#12 0x00000000004d8527 in sc_shutdown (sc=0x7baa9005d500)
    at include/haproxy/sc_strm.h:448
No locals.
#13 process_stream (t=<optimized out>, context=0x7baa9005d560, 
    state=<optimized out>) at src/stream.c:2479
    srv = <optimized out>
    s = 0x7baa9005d560
        sess = <optimized out>
        scf_flags = 295010
        scb_flags = 393251
        rqf_last = 8683520
        rpf_last = 2147778817
        rq_prod_last = 8
        rq_cons_last = 10
        rp_cons_last = 8
        rp_prod_last = 10
        req_ana_back = 0
        res_ana_back = 0
        req = 0x7baa9005d580
        res = 0x7baa9005d5c8
        scf = <optimized out>
        scb = <optimized out>
        rate = <optimized out>
        resync_stconns = <optimized out>
        resync_request = <optimized out>
        resync_response = <optimized out>
#14 0x00000000005c05f0 in run_tasks_from_lists (
    budgets=budgets@entry=0x7baba54fd330) at src/task.c:632
    process = <optimized out>
tl_queues = <optimized out>
        t = 0x7baa9005d920
        budget_mask = 15 '\017'
        profile_entry = 0x0
        done = 0
        queue = 1
        state = 1028
        ctx = <optimized out>
        __func__ = "run_tasks_from_lists"
#15 0x00000000005c0dd1 in process_runnable_tasks () at src/task.c:876
        tt = 0xb7ce80 <ha_thread_ctx+22016>
        lrq = <optimized out>
        grq = <optimized out>
        t = <optimized out>
        max = {0, 33, 0, 0}
        max_total = <optimized out>
        tmp_list = <optimized out>
        queue = <optimized out>
        max_processed = <optimized out>
        lpicked = <optimized out>
        gpicked = <optimized out>
        heavy_queued = 1
        budget = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        not_done_yet = <optimized out>
#16 0x000000000058b362 in run_poll_loop () at src/haproxy.c:3049
        next = <optimized out>
        wake = <optimized out>
        __func__ = "run_poll_loop"
#17 0x000000000058b9a6 in run_thread_poll_loop (data=<optimized out>)
    at src/haproxy.c:3251
        ptaf = <optimized out>
        ptif = 0x93bbb0 <per_thread_init_list>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
            __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
              __prev = 0x0, __next = 0x0}}, 
          __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__lock = 883, __futex = 0, __total_seq = 829, 
            __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x6c, 
            __nwaiters = 0, __broadcast_seq = 0}, 
          __size = "s\003\000\000\000\000\000\000=\003", '\000' <repeats 22 times>, "l", '\000' <repeats 14 times>, __align = 883}
#18 0x00007bac4e1841d2 in start_thread () from /lib/libc.so.6
No symbol table info available.
#19 0x00007bac4e2060b0 in clone3 () from /lib/libc.so.6
No symbol table info available.

Additional Information

haproxy -vv of the older version that is stable:

HAProxy version 2.8.0-b2537a-17 2023/12/07 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.0.html
Running on: Linux 5.15.125.1-2.cm2 #1 SMP Fri Aug 11 02:17:58 UTC 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=64).
Built with OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.45 2021-06-15
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-44)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

We have applied one local patch in 2.9.4. That patch is currently under discussion on the mailing list here.

Finally, we have also tried to reproduce the crash by sending the health check mentioned above, but it does not reproduce the crash. gdb.txt

JB0925 avatar Feb 07 '24 18:02 JB0925

Ok so apparently process is crashing due to ha_panic() being called as a result of thread contention.

According to your trace and version, nearly all threads are stuck here (45 out of 64):

#1  0x00000000005d63c1 in pool_flush (pool=0x22b37c0) at src/pool.c:765
        next = <optimized out>
        temp = <optimized out>
        down = <optimized out>
        bucket = <optimized out>
#2  0x00000000004d9d00 in __task_free (t=<optimized out>) at include/haproxy/task.h:628
        __ptr = <optimized out>

Which corresponds to this:

static inline void __task_free(struct task *t)
{
        if (t == th_ctx->current) {
                th_ctx->current = NULL;
                __ha_barrier_store();
        }
        BUG_ON(task_in_wq(t) || task_in_rq(t));

        BUG_ON((ulong)t->caller & 1);
#ifdef DEBUG_TASK
        HA_ATOMIC_STORE(&t->debug.prev_caller, HA_ATOMIC_LOAD(&t->caller));
#endif
        HA_ATOMIC_STORE(&t->caller, (void*)1); // make sure to crash if used after free

        pool_free(pool_head_task, t);
        th_ctx->nb_tasks--;
        if (unlikely(stopping))
                pool_flush(pool_head_task); /* ======= line 628: STUCK HERE */
}

So this would suggest that the contention occurs during soft-stop. Do you know if your haproxy process is restarted either manually of from a script every once in a while? You mentioned that it was crashing roughly once time a day, so maybe this could coincide with a restart performed once per day?

Now to explain why the crash occurs in 2.9.4 and not on the last 2.8 version you tested (which 2.8 exact version was it by the way?), maybe this could have something to do with 72c23bd4cd3ee3a79955fbdd601206a9fa7d19eb (most recent spoe code change) or recent changes around pools code to reduce contention with multiple buckets introduced in 2.9.

Perhaps in this case some spoe resources are slowly piling up during process runtime, and when soft-stop occurs, all those resources are scheduled for cleanup and there is too much at once for haproxy to keep up?

Darlelet avatar Feb 08 '24 09:02 Darlelet

We were able to somewhat reproduce the crash by restarting haproxy and seeing the crash during shut down.

Not every restart caused the crash though.

  • if after a restart I do another restart within less than 30 minutes, there is no crash. But if I do the restart after a couple of hours, then there is a crash.
  • hosts without the spoa do not seem to crash on restart but the hosts with spoa did crash on restart after running a couple of hours. So it seems the pile up is related to spoa being there.

We plan to run this spoa at much higher requests per second - so this issue seems quite concerning.

Perhaps in this case some spoe resources are slowly piling up during process runtime

I am wondering why we would have such "excessive" pile up that is too much for haproxy to keep up with. And what is it that we are doing wrong. Do you have any suggestions on things we should try to understand the pile up? e.g. any commands we can run before restarting. So that we can do the right fix.

I am trying to compare the "show pools" output of two hosts - one without spoa and one with spoa; I can see that some pools have a big difference. But I am not sure at what point the build up can be termed "excessive". Screenshot 2024-02-08 at 7 19 58 PM

15ljindal avatar Feb 09 '24 01:02 15ljindal

There's something really odd here. It's not expected to spend that much time like this on an entry in a pool, particularly in this version where entries are sharded to further reduce contention.

I'm just wondering if we don't have a bug here in pool_flush() in case multiple threads call it at once. Indeed, the do { ... } while ((ret = xchg(BUSY)) == BUSY) may occasionally replace a NULL with a BUSY if the value changes between the test and the xchg() call. But the following test replaces it with NULL only if ret is not null, so unless I'm missing something, we may occasionally replace a NULL with BUSY since commit 2a4523f6f ("BUG/MAJOR: pools: fix possible race with free() in the lockless variant") merged in 2.5.

I need to study this more carefully to make sure I'm not missing anything. As Aurélien mentioned, it's possible that a recent change puts a bit more stress on some pool_free() calls and more easily triggered the issue.

wtarreau avatar Feb 10 '24 10:02 wtarreau

I'll likely merge the attached patch, you may want to try to apply it. pool.diff.txt

wtarreau avatar Feb 10 '24 10:02 wtarreau

OK that was it, I could also reproduce it by modifying the code, and I can confirm the fix works. I've merged it now and will backport it. In the mean time, you can work around this by disabling the global pools (it's usually disabled except on some rare distros where it's unknown whether malloc() is fast). For this, just add -dMno-global on the command line and you won't hit that bug anymore.

wtarreau avatar Feb 10 '24 11:02 wtarreau

Hey @wtarreau and @Darlelet , I wanted to confirm that this fix is indeed working, and also thank you for the time and effort you put into it, as well as for the quick response. We really appreciate your efforts.

JB0925 avatar Feb 16 '24 14:02 JB0925