haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

Deadlock issue when setting maxconn frontend

Open Olliferdl opened this issue 1 year ago • 5 comments

Detailed Description of the Problem

When resuming a listener by setting maxconn back to a higher value than 0, the thread gets stuck in a deadlock.

Expected Behavior

The listener should resume

Steps to Reproduce the Behavior

set maxconn frontend <fe_name> 0 <send request to frontend> set maxconn frontend <fe_name> 10

Do you have any idea what may have caused this?

It seems commit https://github.com/haproxy/haproxy/commit/001328873c352e5e4b1df0dcc8facaf2fc1408aa introduced the issue by trying to get the proxy lock in resume_listener.

The problem is, that dequeue_proxy_listeners is called in cli_parse_set_maxconn_frontend (https://github.com/haproxy/haproxy/blob/99521abd59a255538f2f9a64d3379c31aef5a630/src/proxy.c#L3044) while we already have the proxy lock, but it assumes we don't have the lock by just passing 0 as lpx to resume_listener here https://github.com/haproxy/haproxy/blob/469fa479501f4807d9983ca46618aba3c4ec8cb7/src/listener.c#L613

Do you have an idea how to solve the issue?

The lock status needs to be passed through dequeue_proxy_listeners down to resume_listeners. The whole code seems to have changed by now, but the issue still exists in the master branch too. There is just another layer inbetween now with relax_listener.

https://github.com/haproxy/haproxy/pull/2724

What is your configuration?

global
    log stdout format raw local0
    stats socket 127.0.0.1:9999 level admin
    stats timeout 2m
    nbthread 1
    maxconn 10000

defaults
    timeout client 30
    timeout connect 10
    timeout server 30
    log global

frontend stats
    maxconn 10
    backlog 8192
    mode http
    bind 127.0.0.1:8081
    default_backend stats

frontend http
    maxconn 10
    mode http
	bind 127.0.0.1:8080
    
    default_backend testbe

backend stats
	mode http
	stats enable
	stats uri /stats
	stats refresh 1s
	stats show-legends
	stats admin if TRUE

backend testbe
    mode http
    timeout queue 1m
    http-request return status 200 content-type "text/plain" string "TeeHee"

Output of haproxy -vv

-

Last Outputs and Backtraces

[NOTICE]   (204635) : New worker (204637) forked
[NOTICE]   (204635) : Loading success.
Connect from 127.0.0.1:50556 to 127.0.0.1:8080 (http/HTTP)
Connect from 127.0.0.1:50558 to 127.0.0.1:8080 (http/HTTP)
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7e4a72c7a400 act=1 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
      1/1    stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=2987816 now=3002756954 diff=2999769138
             curr_task=0x59610d3f8f90 (task) calls=1 last=0
               fct=0x5960deae1350(task_run_applet) ctx=0x59610d3fd6e0(<CLI>)
             strm=0x59610d2ec320,8 src=127.0.0.1 fe=GLOBAL be=GLOBAL dst=<CLI>
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=808000 rqa=0 rpf=80008000 rpa=0
             scf=0x59610d1d4f40,EST,200 scb=0x59610d3c23a0,EST,1
             af=(nil),0 sab=0x59610d3fd6e0,4
             cof=0x59610d3f9180,300:PASS(0x59610d3f9540)/RAW((nil))/tcpv4(11)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)

             call trace(22):
             | 0x5960dea97b3f [85 c0 75 2d 48 8b 84 24]: ha_dump_backtrace+0x3f/0x311
             | 0x5960dea9864e [48 8b 05 bb 34 1e 00 48]: debug_handler+0x6e/0x10b
             | 0x7e4a72a42520 [48 c7 c0 0f 00 00 00 0f]: libc:+0x42520
             | 0x7e4a72a969fc [41 89 c5 41 f7 dd 3d 00]: libc:pthread_kill+0x12c/0x16a
             | 0x7e4a72a42476 [85 c0 75 06 5d c3 0f 1f]: libc:raise+0x16/0x31
             | 0x5960dea96ef7 [64 48 8b 53 10 64 48 8b]: main+0x175567
             | 0x5960dea96f4c [0f 1f 40 00 f3 0f 1e fa]: main+0x1755bc
             | 0x7e4a72a42520 [48 c7 c0 0f 00 00 00 0f]: libc:+0x42520
             | 0x5960dea8ca66 [e9 05 ff ff ff 0f b6 43]: resume_listener+0x156/0x248
             | 0x5960dea8cd25 [eb a9 66 0f 1f 84 00 00]: dequeue_proxy_listeners+0x75/0xa5
             | 0x5960dea584e7 [eb 92 0f 1f 80 00 00 00]: main+0x136b57
             | 0x5960dea479f9 [85 c0 0f 85 95 00 00 00]: main+0x126069
             | 0x5960dea481f4 [49 8b 47 10 48 63 54 24]: main+0x126864
             | 0x5960deae1498 [8b 53 04 48 8b 43 28 f6]: task_run_applet+0x148/0x680
[NOTICE]   (204635) : haproxy version is 2.6.12-f588462
[NOTICE]   (204635) : path to executable is ./haproxy
[ALERT]    (204635) : Current worker (204637) exited with code 134 (Aborted)
[ALERT]    (204635) : exit-on-failure: killing every processes with SIGTERM
[WARNING]  (204635) : All workers exited. Exiting... (134)

Additional Information

I found the issue in the debian 2.6.12 build, i can't figure out how these debian version numbers correlate to this repo tho. The first tag where i could find the issue was v2.7.0. The fix for the v2.7.0 version is here: https://github.com/ixopay/haproxy/tree/fix_dequeue_proxy_listeners_deadlock_v2.7.0

Olliferdl avatar Sep 24 '24 13:09 Olliferdl

I quickly reviewed your patch and it seems good to me (but I have not tested it). Could you make a clean patch following the CONTRIBUTING rules ?

capflam avatar Sep 24 '24 13:09 capflam

I'll try to make a clean patch tomorrow.

Olliferdl avatar Sep 24 '24 13:09 Olliferdl

All versions from 2.4 up to master are affected indeed, thanks for the report and the analysis

Darlelet avatar Sep 24 '24 14:09 Darlelet

I sent a patch to the mailing list

Olliferdl avatar Sep 25 '24 09:09 Olliferdl

Many thanks !

capflam avatar Sep 25 '24 15:09 capflam