haproxy
haproxy copied to clipboard
Deadlock issue when setting maxconn frontend
Detailed Description of the Problem
When resuming a listener by setting maxconn back to a higher value than 0, the thread gets stuck in a deadlock.
Expected Behavior
The listener should resume
Steps to Reproduce the Behavior
set maxconn frontend <fe_name> 0 <send request to frontend> set maxconn frontend <fe_name> 10
Do you have any idea what may have caused this?
It seems commit https://github.com/haproxy/haproxy/commit/001328873c352e5e4b1df0dcc8facaf2fc1408aa introduced the issue by trying to get the proxy lock in resume_listener.
The problem is, that dequeue_proxy_listeners is called in cli_parse_set_maxconn_frontend (https://github.com/haproxy/haproxy/blob/99521abd59a255538f2f9a64d3379c31aef5a630/src/proxy.c#L3044) while we already have the proxy lock, but it assumes we don't have the lock by just passing 0 as lpx to resume_listener here https://github.com/haproxy/haproxy/blob/469fa479501f4807d9983ca46618aba3c4ec8cb7/src/listener.c#L613
Do you have an idea how to solve the issue?
The lock status needs to be passed through dequeue_proxy_listeners down to resume_listeners. The whole code seems to have changed by now, but the issue still exists in the master branch too. There is just another layer inbetween now with relax_listener.
https://github.com/haproxy/haproxy/pull/2724
What is your configuration?
global
log stdout format raw local0
stats socket 127.0.0.1:9999 level admin
stats timeout 2m
nbthread 1
maxconn 10000
defaults
timeout client 30
timeout connect 10
timeout server 30
log global
frontend stats
maxconn 10
backlog 8192
mode http
bind 127.0.0.1:8081
default_backend stats
frontend http
maxconn 10
mode http
bind 127.0.0.1:8080
default_backend testbe
backend stats
mode http
stats enable
stats uri /stats
stats refresh 1s
stats show-legends
stats admin if TRUE
backend testbe
mode http
timeout queue 1m
http-request return status 200 content-type "text/plain" string "TeeHee"
Output of haproxy -vv
-
Last Outputs and Backtraces
[NOTICE] (204635) : New worker (204637) forked
[NOTICE] (204635) : Loading success.
Connect from 127.0.0.1:50556 to 127.0.0.1:8080 (http/HTTP)
Connect from 127.0.0.1:50558 to 127.0.0.1:8080 (http/HTTP)
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7e4a72c7a400 act=1 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
1/1 stuck=1 prof=0 harmless=0 wantrdv=0
cpu_ns: poll=2987816 now=3002756954 diff=2999769138
curr_task=0x59610d3f8f90 (task) calls=1 last=0
fct=0x5960deae1350(task_run_applet) ctx=0x59610d3fd6e0(<CLI>)
strm=0x59610d2ec320,8 src=127.0.0.1 fe=GLOBAL be=GLOBAL dst=<CLI>
txn=(nil),0 txn.req=-,0 txn.rsp=-,0
rqf=808000 rqa=0 rpf=80008000 rpa=0
scf=0x59610d1d4f40,EST,200 scb=0x59610d3c23a0,EST,1
af=(nil),0 sab=0x59610d3fd6e0,4
cof=0x59610d3f9180,300:PASS(0x59610d3f9540)/RAW((nil))/tcpv4(11)
cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
call trace(22):
| 0x5960dea97b3f [85 c0 75 2d 48 8b 84 24]: ha_dump_backtrace+0x3f/0x311
| 0x5960dea9864e [48 8b 05 bb 34 1e 00 48]: debug_handler+0x6e/0x10b
| 0x7e4a72a42520 [48 c7 c0 0f 00 00 00 0f]: libc:+0x42520
| 0x7e4a72a969fc [41 89 c5 41 f7 dd 3d 00]: libc:pthread_kill+0x12c/0x16a
| 0x7e4a72a42476 [85 c0 75 06 5d c3 0f 1f]: libc:raise+0x16/0x31
| 0x5960dea96ef7 [64 48 8b 53 10 64 48 8b]: main+0x175567
| 0x5960dea96f4c [0f 1f 40 00 f3 0f 1e fa]: main+0x1755bc
| 0x7e4a72a42520 [48 c7 c0 0f 00 00 00 0f]: libc:+0x42520
| 0x5960dea8ca66 [e9 05 ff ff ff 0f b6 43]: resume_listener+0x156/0x248
| 0x5960dea8cd25 [eb a9 66 0f 1f 84 00 00]: dequeue_proxy_listeners+0x75/0xa5
| 0x5960dea584e7 [eb 92 0f 1f 80 00 00 00]: main+0x136b57
| 0x5960dea479f9 [85 c0 0f 85 95 00 00 00]: main+0x126069
| 0x5960dea481f4 [49 8b 47 10 48 63 54 24]: main+0x126864
| 0x5960deae1498 [8b 53 04 48 8b 43 28 f6]: task_run_applet+0x148/0x680
[NOTICE] (204635) : haproxy version is 2.6.12-f588462
[NOTICE] (204635) : path to executable is ./haproxy
[ALERT] (204635) : Current worker (204637) exited with code 134 (Aborted)
[ALERT] (204635) : exit-on-failure: killing every processes with SIGTERM
[WARNING] (204635) : All workers exited. Exiting... (134)
Additional Information
I found the issue in the debian 2.6.12 build, i can't figure out how these debian version numbers correlate to this repo tho. The first tag where i could find the issue was v2.7.0. The fix for the v2.7.0 version is here: https://github.com/ixopay/haproxy/tree/fix_dequeue_proxy_listeners_deadlock_v2.7.0
I quickly reviewed your patch and it seems good to me (but I have not tested it). Could you make a clean patch following the CONTRIBUTING rules ?
I'll try to make a clean patch tomorrow.
All versions from 2.4 up to master are affected indeed, thanks for the report and the analysis
I sent a patch to the mailing list
Many thanks !