haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

Segfault in Lua / "in_table"

Open lwimmer opened this issue 3 years ago • 10 comments

Detailed Description of the Problem

After upgrading from 1.8 to 2.4 we've observed occasional segfaults. After some time we've now finally gotten a coredump.

The crash seems to happen in our Lua code with the call to txn.c:in_table.

I can also reproduce this crash easily with the latest commit on the development branch (2.7-dev5-439be5-83 2022/09/12). With 1.8 I cannot reproduce the crash.

Luckily I was able to come up with a minimal configuration which allows me to reproduce this crash easily (see below).

Expected Behavior

No segmentation fault.

Steps to Reproduce the Behavior

Start HAProxy with the config below and send any request:

$ curl http://localhost:8080
curl: (52) Empty reply from server

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

# crash.lua:
core.register_action("crash", { "http-req" }, function(txn)
  txn.c:in_table("test")
end);

# haproxy.cfg:
global
    lua-load crash.lua

frontend  main
    mode http
    bind *:8080
    stick-table type string len 50 size 1000 expire 1m
    http-request lua.crash

Output of haproxy -vv

Linux name 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

HAProxy version 2.7-dev5-439be5-83 2022/09/12 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O0 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_LUA=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL +THREAD -PTHREAD_EMULATION +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO -OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC -PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with Lua version : Lua 5.4.4
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built without PCRE or PCRE2 support (using libc's regex instead)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.2.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

# Last output:
[NOTICE]   (423503) : haproxy version is 2.7-dev5-439be5-83
[NOTICE]   (423503) : path to executable is ./haproxy
[WARNING]  (423503) : config : missing timeouts for frontend 'main'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Segmentation fault (core dumped)

# gdb:
$ gdb haproxy core.haproxy.423503 
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from haproxy...
[New LWP 423503]
[New LWP 423508]
[New LWP 423505]
[New LWP 423509]
[New LWP 423506]
[New LWP 423507]
[New LWP 423510]
[New LWP 423511]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./haproxy -f haproxy2.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056551966d714 in sample_convert (sample=0x7fff1fe0c840, req_type=446250792) at include/haproxy/sample.h:70
70		if (!sample_casts[sample->data.type][req_type])
[Current thread is 1 (Thread 0x7f2dd40e2f00 (LWP 423503))]
(gdb) t a a bt full

Thread 8 (Thread 0x7f2dc27fc640 (LWP 423511)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=24, events=0x7f2db0025660, maxevents=200, timeout=369) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=860791628, wake=0) at src/ev_epoll.c:232
        timeout = 369
        status = -1031813568
        fd = -1
        count = 22101
        updt_idx = 0
        wait_time = 369
        old_fd = -1
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 860791628
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e4c0 <ha_thread_info+448>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834513409600, 0, 139834809526352, 140733728215296, -3671767599692512376, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 7 (Thread 0x7f2dc2ffd640 (LWP 423510)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=27, events=0x7f2da4025660, maxevents=200, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=0, wake=0) at src/ev_epoll.c:232
        timeout = 60000
        status = -1023420864
        fd = -1
        count = 0
        updt_idx = 2
        wait_time = 60000
        old_fd = 4
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
--Type <RET> for more, q to quit, c to continue without paging--c
        next = 0
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e480 <ha_thread_info+384>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834521802304, 0, 139834809526352, 140733728215296, -3671766501791497336, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 6 (Thread 0x7f2dd0f91640 (LWP 423507)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=13, events=0x7f2db4025660, maxevents=200, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=0, wake=0) at src/ev_epoll.c:232
        timeout = 60000
        status = -788982208
        fd = -1
        count = 0
        updt_idx = 2
        wait_time = 60000
        old_fd = 4
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 0
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e3c0 <ha_thread_info+192>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834756240960, 0, 139834809526352, 140733728215296, -3671726874275740792, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 5 (Thread 0x7f2dd1792640 (LWP 423506)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=10, events=0x7f2dbc025660, maxevents=200, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=0, wake=0) at src/ev_epoll.c:232
        timeout = 60000
        status = -780589504
        fd = -1
        count = 0
        updt_idx = 2
        wait_time = 60000
        old_fd = 4
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 0
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e380 <ha_thread_info+128>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834764633664, 0, 139834809526352, 140733728215296, -3671730172273753208, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 4 (Thread 0x7f2dc37fe640 (LWP 423509)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=21, events=0x7f2dac025660, maxevents=200, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=0, wake=0) at src/ev_epoll.c:232
        timeout = 60000
        status = -1015028160
        fd = -1
        count = 0
        updt_idx = 2
        wait_time = 60000
        old_fd = 4
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 0
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e440 <ha_thread_info+320>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834530195008, 0, 139834809526352, 140733728215296, -3671769799789509752, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 3 (Thread 0x7f2dd40d7640 (LWP 423505)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=7, events=0x7f2dcc0266a0, maxevents=200, timeout=60000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=0, wake=0) at src/ev_epoll.c:232
        timeout = 60000
        status = -737315264
        fd = -1
        count = 0
        updt_idx = 2
        wait_time = 60000
        old_fd = 4
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 0
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e340 <ha_thread_info+64>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x7f2dd4262850 <start_thread>
        ptff = 0x7fff1fe0d0a0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834807907904, 0, 139834809526352, 140733728215296, -3671719893880142968, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 2 (Thread 0x7f2dc3fff640 (LWP 423508)):
#0  0x00007f2dd42f3fde in epoll_wait (epfd=16, events=0x7f2db8025660, maxevents=200, timeout=369) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x0000565519579b65 in _do_poll (p=0x565519920800 <cur_poller>, exp=860791628, wake=0) at src/ev_epoll.c:232
        timeout = 369
        status = -1006635456
        fd = -1
        count = 22101
        updt_idx = 0
        wait_time = 369
        old_fd = -1
#2  0x00005655197467da in run_poll_loop () at src/haproxy.c:2878
        next = 860791628
        wake = 0
        __func__ = <optimized out>
#3  0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e400 <ha_thread_info+256>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x0
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#4  0x00007f2dd4262b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140733728214944, 3697673187556459400, 139834538587712, 0, 139834809526352, 140733728215296, -3671768697593527416, -3671719712237550712}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f2dd42f4a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Thread 1 (Thread 0x7f2dd40e2f00 (LWP 423503)):
#0  0x000056551966d714 in sample_convert (sample=0x7fff1fe0c840, req_type=446250792) at include/haproxy/sample.h:70
No locals.
#1  0x0000565519670261 in smp_to_stkey (smp=0x7fff1fe0c840, t=0x56551a993e80) at src/stick_table.c:1028
No locals.
#2  0x00005655196707cc in sample_conv_in_table (arg_p=0x7fff1fe0c8e0, smp=0x7fff1fe0c840, private=0x0) at src/stick_table.c:1236
        t = 0x56551a993e80
        key = 0x56551a993e80
        ts = 0x7fff1fe0c840
#3  0x0000565519588475 in hlua_run_sample_conv (L=0x56551ab05058) at src/hlua.c:4282
        hsmp = 0x56551ab05718
        conv = 0x565519910680 <sample_conv_kws+16>
        args = {{type = 11 '\v', unresolved = 0 '\000', type_flags = 0 '\000', data = {sint = 94923518459520, str = {size = 94923518459520, area = 0x0, data = 0, head = 0}, ipv4 = {s_addr = 446250624}, ipv6 = {__in6_u = {__u6_addr8 = "\200>\231\032UV\000\000\000\000\000\000\000\000\000", __u6_addr16 = {16000, 6809, 22101, 0, 0, 0, 0, 0}, __u6_addr32 = {446250624, 22101, 0, 0}}}, prx = 0x56551a993e80, srv = 0x56551a993e80, t = 0x56551a993e80, usr = 0x56551a993e80, map = 0x56551a993e80, reg = 0x56551a993e80, fid = {ids = 0x56551a993e80, sz = 0}, var = {name_hash = 94923518459520, scope = SCOPE_SESS}, ptr = 0x56551a993e80}}, {type = 0 '\000', unresolved = 0 '\000', type_flags = 0 '\000', data = {sint = 0, str = {size = 0, area = 0x0, data = 0, head = 0}, ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, prx = 0x0, srv = 0x0, t = 0x0, usr = 0x0, map = 0x0, reg = 0x0, fid = {ids = 0x0, sz = 0}, var = {name_hash = 0, scope = SCOPE_SESS}, ptr = 0x0}} <repeats 12 times>}
        i = 0
        smp = {flags = 128, data = {type = 6, u = {sint = 5, ipv4 = {s_addr = 5}, ipv6 = {__in6_u = {__u6_addr8 = "\005\000\000\000\000\000\000\000\330\"\231\032UV\000", __u6_addr16 = {5, 0, 0, 0, 8920, 6809, 22101, 0}, __u6_addr32 = {5, 0, 446243544, 22101}}}, str = {size = 5, area = 0x56551a9922d8 "test", data = 4, head = 0}, meth = {meth = HTTP_METH_DELETE, str = {size = 94923518452440, area = 0x4 <error: Cannot access memory at address 0x4>, data = 0, head = 0}}}}, ctx = {p = 0x0, i = 0, ll = 0, d = 0, a = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, px = 0x56551a993e80, sess = 0x56551ab04680, strm = 0x56551ab049b0, opt = 0}
#4  0x00007f2dd4406b95 in ?? () from /lib/x86_64-linux-gnu/liblua5.4.so.0
No symbol table info available.
#5  0x00007f2dd4413702 in ?? () from /lib/x86_64-linux-gnu/liblua5.4.so.0
No symbol table info available.
#6  0x00007f2dd44029a3 in ?? () from /lib/x86_64-linux-gnu/liblua5.4.so.0
No symbol table info available.
#7  0x00007f2dd4403e00 in lua_resume () from /lib/x86_64-linux-gnu/liblua5.4.so.0
No symbol table info available.
#8  0x0000565519581e21 in hlua_ctx_resume (lua=0x56551ab04fa0, yield_allowed=1) at src/hlua.c:1456
        nres = 447760816
        ret = 22101
        msg = 0x56551ab05058 "pƠ\032UV"
        trace = 0x7fff1fe0d588 "\225\362\340\037\377\177"
        __func__ = <optimized out>
#9  0x0000565519595ad7 in hlua_action (rule=0x56551a991da0, px=0x56551a993e80, sess=0x56551ab04680, s=0x56551ab049b0, flags=2) at src/hlua.c:9155
        arg = 0x56551a991e80
        hflags = 32
        dir = 0
        act_ret = 0
        error = 0x1000000010000 <error: Cannot access memory at address 0x1000000010000>
#10 0x000056551964ded5 in http_req_get_intercept_rule (px=0x56551a993e80, def_rules=0x0, rules=0x56551a993ec8, s=0x56551ab049b0) at src/http_ana.c:2840
        sess = 0x56551ab04680
        txn = 0x56551ab04eb0
        rule = 0x56551a991da0
        rule_ret = HTTP_RULE_RES_CONT
        act_opts = 2
#11 0x0000565519646fb5 in http_process_req_common (s=0x56551ab049b0, req=0x56551ab049d0, an_bit=16, px=0x56551a993e80) at src/http_ana.c:379
        def_rules = 0x0
        rules = 0x56551a993ec8
        sess = 0x56551ab04680
        txn = 0x56551ab04eb0
        msg = 0x56551ab04ec0
        htx = 0x56551a9ffef0
        rule = 0x56551a9fbe40
        verdict = 22101
        conn = 0x7f2db0026080
#12 0x0000565519628210 in process_stream (t=0x56551ab045d0, context=0x56551ab049b0, state=260) at src/stream.c:1998
        max_loops = 199
        ana_list = 48
        ana_back = 48
        flags = 1078984706
        srv = 0x0
        s = 0x56551ab049b0
        sess = 0x56551ab04680
        rqf_last = 1077936128
        rpf_last = 2147483648
        rq_prod_last = 8
        rq_cons_last = 0
        rp_cons_last = 8
        rp_prod_last = 0
        req_ana_back = 0
        req = 0x56551ab049d0
        res = 0x56551ab04a30
        scf = 0x56551aae2d80
        scb = 0x56551ab04df0
        rate = 0
#13 0x00005655197995c2 in run_tasks_from_lists (budgets=0x7fff1fe0d250) at src/task.c:626
        process = 0x565519626e9c <process_stream>
        tl_queues = 0x565519b82770 <ha_thread_ctx+112>
        t = 0x56551ab045d0
        budget_mask = 15 '\017'
        profile_entry = 0x0
        done = 0
        queue = 1
        state = 260
        ctx = 0x56551ab049b0
        __func__ = <optimized out>
#14 0x0000565519799f85 in process_runnable_tasks () at src/task.c:855
        tt = 0x565519b82700 <ha_thread_ctx>
        lrq = 0x0
        grq = 0x0
        t = 0x56551ab045d0
        default_weights = {64, 48, 16, 1}
        max = {0, 90, 0, 0}
        max_total = 48
        tmp_list = 0x0
        queue = 4
        max_processed = 91
        lpicked = 1
        gpicked = 0
        heavy_queued = 1
        budget = 91
#15 0x0000565519746400 in run_poll_loop () at src/haproxy.c:2807
        next = 860791628
        wake = 0
        __func__ = <optimized out>
#16 0x0000565519746ab1 in run_thread_poll_loop (data=0x565519b7e300 <ha_thread_info>) at src/haproxy.c:2996
        ptaf = 0x56551991d540 <per_thread_alloc_list>
        ptif = 0x56551991d550 <per_thread_init_list>
        ptdf = 0x7fff1fe0d310
        ptff = 0x0
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {__wseq = {__value64 = 31, __value32 = {__low = 31, __high = 0}}, __g1_start = {__value64 = 27, __value32 = {__low = 27, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 8, __wrefs = 0, __g_signals = {0, 0}}, __size = "\037\000\000\000\000\000\000\000\033", '\000' <repeats 23 times>, "\b", '\000' <repeats 14 times>, __align = 31}
#17 0x0000565519748329 in main (argc=3, argv=0x7fff1fe0d588) at src/haproxy.c:3645
        err = 0
        retry = 200
        limit = {rlim_cur = 1048575, rlim_max = 1048576}
        pidfd = -1
        intovf = 4
(gdb)

Additional Information

No response

lwimmer avatar Sep 12 '22 20:09 lwimmer

Hi,

Ok there's definitively a problem here. The two patches should address it, any chance you can test them ?

Thanks ! 0001-BUG-MEDIUM-lua-Get-the-arguments-for-running-sample_.txt 0002-BUG-MEDIUM-lua-handle-stick-table-implicit-arguments.txt

cognet avatar Sep 12 '22 22:09 cognet

Wow, you guys are amazing! So far I've never waited more than 24 hours for a patch :smile:

This definitely fixes my local test setup.

Could you send me patches which apply to the latest 2.4? In this case I can do more extensive tests in our development environment, where we are currently running 2.4.18. It's much easier for me to get a patched 2.4 version running there.

lwimmer avatar Sep 13 '22 05:09 lwimmer

In your example lua script, shouldn't txn.c:in_table() take two arguments? First one being input data and second one being the table where to search for the data?

Applying 0001-BUG-MEDIUM-lua-Get-the-arguments-for-running-sample_.txt in 2.4 series break this behavior (not accepting input argument anymore)

Darlelet avatar Sep 13 '22 07:09 Darlelet

In your example lua script, shouldn't txn.c:in_table() take two arguments? First one being input data and second one being the table where to search for the data?

Yes, you are correct! The second argument was lost during the creation of the example it seems. But I can also reproduce the Segmentation Fault with two arguments (without the patch of course).

lwimmer avatar Sep 13 '22 07:09 lwimmer

But I can also reproduce the Segmentation Fault with two arguments (without the patch of course).

No, sorry. I spoke too soon. With 2 arguments I cannot reproduce the crash.

lwimmer avatar Sep 13 '22 07:09 lwimmer

Yes, so it was actually a bug in our Lua code where we had a call to in_table with one argument, which should have had two arguments. We did not notice this, because the code path was not in use anymore. We just noticed the segfault after the upgrade to 2.4, because there were still some requests coming in, which still triggered this code path to run, although it was obsolete.

lwimmer avatar Sep 13 '22 07:09 lwimmer

This is great news! Regarding the crash, even if only one argument is provided to in_table, it should not cause the main process to crash. It seems that implicit arguments for sticky table is supported for a while now.

But for this to work, you need a table to be declared within the proxy that will call lua function upon http-req.

@cognet patch: 0002-BUG-MEDIUM-lua-handle-stick-table-implicit-arguments.txt seems to be required for this to work on Haproxy latest code because of stick table code change as he explained in the commit msg.

But if no table is declared within the proxy that triggers lua code, we still have a crash in sample_conv_in_table() (stick_table.c)

The following patch adds extra check in sample_conv_in_table() to prevent crash when in_table() is called without table argument and no table currently exists within the calling proxy: 0001-BUG-MEDIUM-stick-table-in_table_check_for_empty_sticktable.txt

I believe both 0002-BUG-MEDIUM-lua-handle-stick-table-implicit-arguments.txt from @cognet and 0001-BUG-MEDIUM-stick-table-in_table_check_for_empty_sticktable.txt are required to make this call safe without breaking existing behavior.

Darlelet avatar Sep 13 '22 09:09 Darlelet

Yes, so it was actually a bug in our Lua code where we had a call to in_table with one argument, which should have had two arguments. We did not notice this, because the code path was not in use anymore. We just noticed the segfault after the upgrade to 2.4, because there were still some requests coming in, which still triggered this code path to run, although it was obsolete.

Let me clarify my previous message, I was not totally right. Depending on your usage of in_table, it might not be a bug in your Lua code:

It seems that either calling in_table with explicit table name or without table name (in which case table is inherited from calling proxy, if there is one) are valid use cases.

But here we witness a regression only affecting the case where you call in_table with one argument. The proposed patches should fix the regression.

So you're left with two options:

Darlelet avatar Sep 13 '22 09:09 Darlelet

Let me clarify my previous message, I was not totally right. Depending on your usage of in_table, it might not be a bug in your Lua code:

It seems that either calling in_table with explicit table name or without table name (in which case table is inherited from calling proxy, if there is one) are valid use cases.

Thanks for clarification! We've already solved our problem by simply removing the obsolete code :)

So we don't need the patch right now and everything works for us currently. Thank you!

lwimmer avatar Sep 13 '22 10:09 lwimmer

Happy to learn that you solved it! :)

Please leave the issue opened: we'll wait for @cognet to come back so we can fix the bug for good.

Darlelet avatar Sep 13 '22 10:09 Darlelet

Guys please, I got lost here, what's the status of this issue now ? Are patches still needed or not, and if so, which ones ?

wtarreau avatar Oct 03 '22 13:10 wtarreau

Thank you for your update Olivier.

This 3rd patch is required to make 0002 handle the case where no table is declared within the proxy: 0003-BUG-MEDIUM-hlua-hlua_lua2arg_check-handle-NULL-stick.patch.txt

(Or maybe @cognet could amend his 0002 patch to make backporting easier, I'm fine with that)

Also it seems that 0001 commit message was not properly amended ("lua: Get the arguments for running sample_conv right." is not relevant anymore)

Darlelet avatar Oct 03 '22 15:10 Darlelet

Oops nice catch. Yeah your patch makes sense, I'll merge it with mine.

cognet avatar Oct 03 '22 15:10 cognet

Thank you guys, now merged!

wtarreau avatar Oct 03 '22 17:10 wtarreau