haproxy
haproxy copied to clipboard
infinite loop when shutting down
Detailed Description of the Problem
this was originally observed on a haproxy 2.6 deployment in OpenShift. When shutting down, we observe a race condition where the thread with tid=0 waits for other threads to complete, while some remaining threads will loop and take 100% CPU doing syscalls epoll_wait and clock_gettime
Expected Behavior
haproxy should shut down cleanly
Steps to Reproduce the Behavior
- as this race condition seems rare, we apply this patch to trigger it
diff --git a/src/haproxy.c b/src/haproxy.c
index be8b587cc..61ac22d85 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -3069,6 +3069,7 @@ void run_poll_loop()
_HA_ATOMIC_OR(&th_ctx->flags, TH_FL_SLEEPING);
_HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_NOTIFIED);
__ha_barrier_atomic_store();
+ if (!signal_queue_len) sleep(tid);
if (thread_has_tasks())
-
make TARGET=linux-glibc USE_OPENSSL=0 USE_ZLIB=0 USE_PCRE=0 USE_LUA=0 USE_EPOLL=1
-
./haproxy -f examples/quick-test.cfg -p ${XDG_RUNTIME_DIR}/haproxy.pid -d
-
while sleep 1 ; do ps -p `pgrep haproxy` -T -o tid,wchan; done
- wait for the threads to say "ep_poll"
-
pkill -USR1 haproxy
- wait until there are 4 threads remaining
-
pkill -USR1 haproxy
- feel the laptop heating and burning my legs
Do you have any idea what may have caused this?
I believe some sort of race condition
Do you have an idea how to solve the issue?
No response
What is your configuration?
haproxy from main branch
Output of haproxy -vv
HAProxy version 3.0-dev7-ab7f05-82 2024/04/17 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.14.0-440.frigo.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Apr 16 16:02:50 UTC 2024 x86_64
Build options :
TARGET = linux-glibc
CC = cc
CFLAGS = -O2 -g -fwrapv
OPTIONS = USE_EPOLL=1 USE_OPENSSL=0 USE_LUA=0 USE_ZLIB=0 USE_PCRE=0
DEBUG =
Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER -OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ -SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=9).
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built without PCRE or PCRE2 support (using libc's regex instead)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.1 20231218 (Red Hat 11.4.1-3)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : none
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out [CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
(gdb) thread apply all bt
Thread 4 (Thread 0x7ff0ee7fb640 (LWP 18420) "haproxy"):
#0 0x00007ffdd55e2b95 in ?? ()
#1 0x00007ff0facd3f4d in __GI___clock_gettime (clock_id=clock_id@entry=3, tp=tp@entry=0x7ff0ee7f0b50) at ../sysdeps/unix/sysv/linux/clock_gettime.c:42
#2 0x00000000005ca064 in now_cpu_time () at src/clock.c:91
#3 clock_entering_poll () at src/clock.c:402
#4 0x0000000000423f58 in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:227
#5 0x000000000052b506 in run_poll_loop () at src/haproxy.c:3131
#6 0x000000000052bbb7 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3273
#7 0x00007ff0fac89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#8 0x00007ff0fad0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 3 (Thread 0x7ff0eeffc640 (LWP 18419) "haproxy"):
#0 0x00007ffdd55e2b95 in ?? ()
#1 0x00007ff0facd3f4d in __GI___clock_gettime (clock_id=clock_id@entry=3, tp=tp@entry=0x7ff0eeff1b50) at ../sysdeps/unix/sysv/linux/clock_gettime.c:42
#2 0x00000000005ca064 in now_cpu_time () at src/clock.c:91
#3 clock_entering_poll () at src/clock.c:402
#4 0x0000000000423f58 in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:227
#5 0x000000000052b506 in run_poll_loop () at src/haproxy.c:3131
#6 0x000000000052bbb7 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3273
#7 0x00007ff0fac89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#8 0x00007ff0fad0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 2 (Thread 0x7ff0f89d9640 (LWP 18416) "haproxy"):
#0 0x00007ffdd55e2b95 in ?? ()
#1 0x00007ff0facd3f4d in __GI___clock_gettime (clock_id=clock_id@entry=3, tp=tp@entry=0x7ff0f89ceb50) at ../sysdeps/unix/sysv/linux/clock_gettime.c:42
#2 0x00000000005ca064 in now_cpu_time () at src/clock.c:91
#3 clock_entering_poll () at src/clock.c:402
#4 0x0000000000423f58 in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:227
#5 0x000000000052b506 in run_poll_loop () at src/haproxy.c:3131
#6 0x000000000052bbb7 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3273
#7 0x00007ff0fac89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
#8 0x00007ff0fad0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 1 (Thread 0x7ff0faf8d940 (LWP 18412) "haproxy"):
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=18416, futex_word=0x7ff0f89d9910) at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ff0f89d9910, expected=18416, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007ff0fac867ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ff0f89d9910, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at futex-internal.c:139
#3 0x00007ff0fac8b6d3 in __pthread_clockjoin_ex (threadid=140672939955776, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:102
#4 0x00007ff0fac8b5a3 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:25
#5 0x00000000005a97c3 in wait_for_threads_completion () at src/thread.c:264
#6 0x0000000000420b07 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3991
(gdb) p signal_queue_len
$1 = 1
Additional Information
$ sudo perf stat -e 'syscalls:sys_enter_epoll_wait' -e 'syscalls:sys_enter_clock_gettime' -p `pgrep haproxy` -I 1000
# time counts unit events
1.001077725 2,960,256 syscalls:sys_enter_epoll_wait
1.001077725 5,920,362 syscalls:sys_enter_clock_gettime
2.002425052 3,001,638 syscalls:sys_enter_epoll_wait
2.002425052 6,003,275 syscalls:sys_enter_clock_gettime
3.003617281 2,951,947 syscalls:sys_enter_epoll_wait
3.003617281 5,903,900 syscalls:sys_enter_clock_gettime
4.004911210 3,030,400 syscalls:sys_enter_epoll_wait
4.004911210 6,060,803 syscalls:sys_enter_clock_gettime
5.006179439 3,028,230 syscalls:sys_enter_epoll_wait
5.006179439 6,056,459 syscalls:sys_enter_clock_gettime
^C 5.289030721 854,586 syscalls:sys_enter_epoll_wait
5.289030721 1,709,160 syscalls:sys_enter_clock_gettime
Hi Freedge !
Thanks a lot for reporting this !
I try to reproduce, just to confirm you observe the same with the 3.0-dev7-ab7f05-82 2024/04/17 ?
Many thanks in advance, Regards,
so for my reproducer I am using the main branch which is commit ab7f05daba5d640b558465eb505e9909b7a5af14 at the moment and indeed reading "3.0-dev7-ab7f05-82" as version. but of course I had to add one line with a sleep to make it fail as intended, I can work on a reproducer with just gdb or stap if you believe there's some value with it..
On OpenShift the problem is reproduced with haproxy26-2.6.13-2.rhaos4.14.el8. The perf stat and gdb backtrace is identical there. It was overall, visible because of the high "system" CPU consumption on the node.
Ok, thanks for details !
Steps that you've provided and a patch to have a stable behavior are completely sufficient for the moment, thanks for this !
Hi Freedge !
Could you share with us, please your glibc version, please (ldd --version) ?
Regards,
on my centos VM: ldd (GNU libc) 2.34 from glibc-common-2.34-105.el9.x86_64
Hi Freedge !
Could you try, please, to compile haproxy with this flag: DEBUG=-DDEBUG_THREAD and reproduce the infinite loop:
make TARGET=linux-glibc USE_OPENSSL=0 USE_ZLIB=0 USE_PCRE=0 USE_LUA=0 USE_EPOLL=1 DEBUG=-DDEBUG_THREAD
This will enable the dump of lock statistics to stderr by the main thread in wait_for_threads_completion () here. This could provide more insights.
Kind regards,
hi! This should be trivial to reproduce. As per the code you link, the show_lock_stats will only log once the pthread_join completes which is not the case.
but I can trigger this from gdb if you feel it can be any useful:
3 Thread 0x7f63327fb640 (LWP 15143) "haproxy" 0x00007ffde9b82b95 in ?? ()
(gdb) bt
#0 __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=15142, futex_word=0x7f6332ffc910) at futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f6332ffc910, expected=15142, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
private=private@entry=128, cancel=cancel@entry=true) at futex-internal.c:87
#2 0x00007f633ee867ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f6332ffc910, expected=<optimized out>, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=128) at futex-internal.c:139
#3 0x00007f633ee8b6d3 in __pthread_clockjoin_ex (threadid=140064034113088, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
block=block@entry=true) at pthread_join_common.c:102
#4 0x00007f633ee8b5a3 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:25
#5 0x00000000005a0973 in wait_for_threads_completion () at src/thread.c:264
#6 0x0000000000421b97 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3993
(gdb) up
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f6332ffc910, expected=15142, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
private=private@entry=128, cancel=cancel@entry=true) at futex-internal.c:87
87 err = __futex_abstimed_wait_common64 (futex_word, expected, op, abstime,
(gdb)
#2 0x00007f633ee867ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f6332ffc910, expected=<optimized out>, clockid=clockid@entry=0,
abstime=abstime@entry=0x0, private=private@entry=128) at futex-internal.c:139
139 return __futex_abstimed_wait_common (futex_word, expected, clockid,
(gdb)
#3 0x00007f633ee8b6d3 in __pthread_clockjoin_ex (threadid=140064034113088, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0,
block=block@entry=true) at pthread_join_common.c:102
Downloading source file /usr/src/debug/glibc-2.34-105.el9.x86_64/nptl/pthread_join_common.c...
102 int ret = __futex_abstimed_wait_cancelable64 (
(gdb)
#4 0x00007f633ee8b5a3 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:25
Downloading source file /usr/src/debug/glibc-2.34-105.el9.x86_64/nptl/pthread_join.c...
25 return __pthread_clockjoin_ex (threadid, thread_return, 0 /* Ignored */,
(gdb) up
#5 0x00000000005a0973 in wait_for_threads_completion () at src/thread.c:264
264 pthread_join(ha_pthread[i], NULL);
(gdb) list
259 {
260 int i;
261
262 /* Wait the end of other threads */
263 for (i = 1; i < global.nbthread; i++)
264 pthread_join(ha_pthread[i], NULL);
265
266 #if defined(DEBUG_THREAD) || defined(DEBUG_FULL)
267 show_lock_stats();
268 #endif
(gdb) p show_lock_stats()
$1 = void
Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Using epoll() as the polling mechanism.
[WARNING] (15135) : Proxy p stopped (cumulated conns: FE: 0, BE: 0).
Stats about Lock TASK_RQ: not used
Stats about Lock TASK_WQ: not used
Stats about Lock LISTENER:
# write lock : 2
# write unlock: 2 (0)
# wait time for write : 0.000 msec
# wait time for write/lock: 100.000 nsec
Stats about Lock PROXY:
# write lock : 4
# write unlock: 4 (0)
# wait time for write : 0.000 msec
# wait time for write/lock: 100.000 nsec
Stats about Lock SERVER: not used
Stats about Lock LBPRM: not used
Stats about Lock SIGNALS: not used
Stats about Lock STK_TABLE: not used
Stats about Lock STK_SESS: not used
Stats about Lock APPLETS: not used
Stats about Lock PEER: not used
Stats about Lock SHCTX: not used
Stats about Lock SSL: not used
Stats about Lock SSL_GEN_CERTS: not used
Stats about Lock PATREF: not used
Stats about Lock PATEXP:
# write lock : 2
# write unlock: 2 (0)
# wait time for write : 0.000 msec
# wait time for write/lock: 50.000 nsec
Stats about Lock VARS: not used
Stats about Lock COMP_POOL: not used
Stats about Lock LUA: not used
Stats about Lock NOTIF: not used
Stats about Lock SPOE_APPLET: not used
Stats about Lock DNS:
# write lock : 1
# write unlock: 1 (0)
# wait time for write : 0.000 msec
# wait time for write/lock: 0.000 nsec
Stats about Lock PID_LIST: not used
Stats about Lock EMAIL_ALERTS: not used
Stats about Lock PIPES: not used
Stats about Lock TLSKEYS_REF: not used
Stats about Lock AUTH: not used
Stats about Lock RING: not used
Stats about Lock DICT: not used
Stats about Lock PROTO: # write lock : 11
# write unlock: 11 (0)
# wait time for write : 0.001 msec
# wait time for write/lock: 54.545 nsec
Stats about Lock QUEUE: not used
Stats about Lock CKCH: not used
Stats about Lock SNI: not used
Stats about Lock SSL_SERVER: not used
Stats about Lock SFT: not used
Stats about Lock IDLE_CONNS: not used
Stats about Lock OCSP: not used
Stats about Lock QC_CID: not used
Stats about Lock CACHE: not used
Stats about Lock OTHER: not used
Stats about Lock DEBUG1: not used
Stats about Lock DEBUG2: not used
Stats about Lock DEBUG3: not used
Stats about Lock DEBUG4: not used
Stats about Lock DEBUG5: not used
Hi Freedge !
Could you, please, try this patch below and provide us a feedback.
$ git diff
diff --git a/src/haproxy.c b/src/haproxy.c
index 8cc897e75..6850d243e 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -3087,7 +3087,7 @@ void run_poll_loop()
if (thread_has_tasks()) {
activity[tid].wake_tasks++;
_HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
- } else if (signal_queue_len) {
+ } else if (signal_queue_len && tid == 0) {
/* this check is required after setting TH_FL_SLEEPING to avoid
* a race with wakeup on signals using wake_threads() */
_HA_ATOMIC_AND(&th_ctx->flags, ~TH_FL_SLEEPING);
Kind regards,
hi, I tried and this indeed fixes the infinite loop.
hi, why is this closed as completed?
can you reopen this? @wlallemand as maintainer of master-worker part maybe?
The fix was merged and will be back-ported soon. So, I suppose we can close this one now.
@vkssv please don't close the tickets until the fixes are backported in all branches that are concerned.