freeradius-server
freeradius-server copied to clipboard
[defect]: ASSERT FAILED src/lib/server/trunk.c[1051]: 0: Trunk request 0 invalid transition INIT -> UNASSIGNED
What type of defect/bug is this?
Crash or memory corruption (segv, abort, etc...)
How can the issue be reproduced?
Use rlm_ldap
Log output from the FreeRADIUS daemon
The problem can not be reproduced with "radiusd -X" (only when running with multiple threads):
Mon Jul 4 13:29:48 2022: Info : Network 0 - Starting
Mon Jul 4 13:29:48 2022: Info : Worker 0 - Starting
Mon Jul 4 13:29:48 2022: Info : Worker 1 - Starting
Mon Jul 4 13:29:48 2022: Error : Failed setting connection option new TLS context: Unknown error
Mon Jul 4 13:29:48 2022: Error : ldap - [2] - Connection initialisation failed
Mon Jul 4 13:29:53 2022: Info : rlm_ldap - [3] Trunk connection changed state INIT -> CONNECTING
Mon Jul 4 13:29:53 2022: Info : rlm_ldap (ldap) - Directory type: OpenLDAP
Mon Jul 4 13:29:53 2022: Info : rlm_ldap (ldap) - Directory supports RFC 4533
Mon Jul 4 13:29:53 2022: Info : Scheduler created successfully with 1 networks and 2 workers
Mon Jul 4 13:29:53 2022: Info : post-suid-down capabilities: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep
Mon Jul 4 13:29:53 2022: Info : Ready to process requests
Mon Jul 4 13:30:01 2022: Info : rlm_ldap - [4] Trunk connection changed state INIT -> CONNECTING
Mon Jul 4 13:30:01 2022: Info : rlm_ldap (ldap) - Directory type: OpenLDAP
Mon Jul 4 13:30:01 2022: Info : rlm_ldap (ldap) - Directory supports RFC 4533
Mon Jul 4 13:30:03 2022: Info : rlm_ldap - [3] Trunk connection changed state ACTIVE -> CLOSED
Mon Jul 4 13:30:06 2022: Info : rlm_ldap - [4] Trunk connection changed state ACTIVE -> CLOSED
Mon Jul 4 13:30:22 2022: Info : rlm_ldap - [5] Trunk connection changed state INIT -> CONNECTING
Mon Jul 4 13:30:32 2022: Info : rlm_ldap - [6] Trunk connection changed state INIT -> CONNECTING
Mon Jul 4 13:30:35 2022: Info : rlm_ldap - [5] Trunk connection changed state ACTIVE -> CLOSED
Mon Jul 4 13:30:35 2022: Info : rlm_ldap - [6] Trunk connection changed state ACTIVE -> CLOSED
ASSERT FAILED src/lib/server/trunk.c[1051]: 0: Trunk request 0 invalid transition INIT -> UNASSIGNED
CAUGHT SIGNAL: Aborted
Backtrace of last 21 frames:
/usr/lib64/freeradius/libfreeradius-util.so(fr_fault+0xe8)[0x7f528c366036]
/usr/lib64/freeradius/libfreeradius-util.so(_fr_assert_fail+0xf1)[0x7f528c366c26]
/usr/lib64/freeradius/libfreeradius-server.so(+0x918af)[0x7f528bc6e8af]
/usr/lib64/freeradius/libfreeradius-server.so(fr_trunk_request_signal_cancel+0x208)[0x7f528bc72c60]
/usr/lib64/freeradius/libfreeradius-ldap.so(+0x7d08)[0x7f5284fb5d08]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x1f580)[0x7f528b947580]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_signal+0x151)[0x7f528b94cbf1]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1ec7c)[0x7f528b711c7c]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1ed64)[0x7f528b711d64]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_timer_run+0x139)[0x7f528c389d70]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_service+0x702)[0x7f528c38a745]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_synchronous+0x228)[0x7f528b94eeb0]
/usr/lib64/freeradius/libfreeradius-ldap.so(fr_ldap_trunk_search+0x1e7)[0x7f5284fb5f50]
/usr/lib64/freeradius/rlm_ldap.so(rlm_ldap_find_user+0x324)[0x7f5285658e08]
/usr/lib64/freeradius/rlm_ldap.so(+0x8ac4)[0x7f5285652ac4]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x2b973)[0x7f528b953973]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret+0x683)[0x7f528b94a22b]
/usr/lib64/freeradius/libfreeradius-io.so(fr_worker+0x2f5)[0x7f528b7145a0]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1b76f)[0x7f528b70e76f]
/lib64/libpthread.so.0(+0x7ea5)[0x7f5289bb5ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f5289467b0d]
Relevant log output from client utilities
No response
Backtrace from LLDB or GDB
No response
Another one (similar code path and the same reproduction scenario: a lot of requests with LDAP authz coming in). But this time query->treq is NULL
src/lib/server/trunk.c:2052: Type mismatch: name[NULL] expected[fr_trunk_request_t]
(gdb) bt
#0 0x00007ffff514e4fb in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
#1 0x00007ffff7b082de in fr_fault (sig=6) at src/lib/util/debug.c:1051
#2 0x00007ffff7b083b8 in _fr_talloc_fault (
reason=0x7fffd809ed80 "src/lib/server/trunk.c:2052: Type mismatch: name[NULL] expected[fr_trunk_request_t]") at src/lib/util/debug.c:1093
#3 0x00007ffff652f3d5 in talloc_abort_type_mismatch (expected=<optimized out>, name=<optimized out>,
location=<optimized out>) at ../../talloc.c:1584
#4 _talloc_get_type_abort (ptr=0x0, name=0x7ffff74712ed "fr_trunk_request_t", location=<optimized out>)
at ../../talloc.c:1601
#5 0x00007ffff74139d7 in fr_trunk_request_signal_cancel (treq=0x0) at src/lib/server/trunk.c:2052
#6 0x00007fffeb341d08 in ldap_trunk_query_cancel (request=0x7fffdba93010, action=FR_SIGNAL_CANCEL,
uctx=0x7fffd9faf920) at src/lib/ldap/base.c:616
#7 0x00007ffff70e6205 in unlang_function_signal (request=0x7fffdba93010, frame=0x7fffdba933e8,
action=FR_SIGNAL_CANCEL) at src/lib/unlang/function.c:80
#8 0x00007ffff70eb876 in frame_signal (limit=0, action=FR_SIGNAL_CANCEL, request=0x7fffdba93010)
at src/lib/unlang/interpret.c:1028
#9 unlang_interpret_signal (request=0x7fffdba93010, action=FR_SIGNAL_CANCEL)
at src/lib/unlang/interpret.c:1082
#10 0x00007ffff6eb0c7c in worker_stop_request (request_p=0x7fffe780b5c8) at src/lib/io/worker.c:383
#11 0x00007ffff6eb0d64 in worker_max_request_time (el=0x7fffd8004c80, when=..., uctx=0x7fffd800d500)
at src/lib/io/worker.c:419
#12 0x00007ffff7b2bfc2 in fr_event_timer_run (el=0x7fffd8004c80, when=0x7fffe780b670)
at src/lib/util/event.c:2146
#13 0x00007ffff7b2c997 in fr_event_service (el=0x7fffd8004c80) at src/lib/util/event.c:2473
#14 0x00007ffff70edb35 in unlang_interpret_synchronous (el=0x7fffd8004c80, request=0x7fffdba93010)
at src/lib/unlang/interpret_synchronous.c:236
#15 0x00007fffeb341f50 in fr_ldap_trunk_search (p_result=0x7fffe780c9c4, ctx=0x7fffdba96210,
out=0x7fffe780b888, request=0x7fffdba93010, ttrunk=0x7fffd802e460, base_dn=0x1566c30 "dc=viasat,dc=com",
scope=2, filter=0x7fffe780bcb0 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe780c578,
serverctrls=0x7fffe780b890, clientctrls=0x0, is_async=false) at src/lib/ldap/base.c:723
#16 0x00007fffeb9e4eb4 in rlm_ldap_find_user (inst=0x120daf0, request=0x7fffdba93010, ttrunk=0x7fffd802e460,
attrs=0x7fffe780c578, force=true, result=0x7fffe780c9b0, handle=0x7fffe780c9b8, rcode=0x7fffe780c9c4)
at src/modules/rlm_ldap/user.c:118
#17 0x00007fffeb9deb70 in mod_authorize (p_result=0x7fffdba96228, mctx=0x7fffe780ca50,
request=0x7fffdba93010) at src/modules/rlm_ldap/rlm_ldap.c:1166
#18 0x00007ffff70f25f8 in unlang_module (p_result=0x7fffdba931dc, request=0x7fffdba93010,
frame=0x7fffdba933a8) at src/lib/unlang/module.c:924
#19 0x00007ffff70e8eb0 in frame_eval (priority=0x7fffdba931d8, result=0x7fffdba931dc, frame=0x7fffdba933a8,
request=0x7fffdba93010) at src/lib/unlang/interpret.c:500
#20 unlang_interpret (request=0x7fffdba93010) at src/lib/unlang/interpret.c:684
#21 0x00007ffff6eb35a0 in worker_run_request (start=..., worker=0x7fffd800d500) at src/lib/io/worker.c:1184
#22 fr_worker (worker=0x7fffd800d500) at src/lib/io/worker.c:1369
#23 0x00007ffff6ead76f in fr_schedule_worker_thread (arg=0x1672320) at src/lib/io/schedule.c:242
#24 0x00007ffff5146ea5 in start_thread (arg=0x7fffe7839700) at pthread_create.c:307
#25 0x00007ffff49f8b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) frame 6
#6 0x00007fffeb341d08 in ldap_trunk_query_cancel (request=0x7fffdba93010, action=FR_SIGNAL_CANCEL,
uctx=0x7fffd9faf920) at src/lib/ldap/base.c:616
616 fr_trunk_request_signal_cancel(query->treq);
(gdb) print *query
$1 = {node = {left = 0x7ffff7dd3040 <sentinel>, right = 0x7ffff7dd3040 <sentinel>,
parent = 0x7ffff7dd3040 <sentinel>, data = 0x7fffd9faf920, colour = BLACK, being_freed = false},
ldap_url = 0x0, dn = 0x1566c30 "dc=viasat,dc=com", {search = {attrs = 0x7fffe780c578, scope = 2,
filter = 0x7fffe780bcb0 "(cn=1,6,00:a0:bc:6c:7d:3a)"}, mods = 0x7fffe780c578},
type = LDAP_REQUEST_SEARCH, serverctrls = {{control = 0x0, freeit = false}, {control = 0x0,
freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {control = 0x0,
freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {control = 0x0,
freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}}, clientctrls = {{
control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {
control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {
control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {control = 0x0, freeit = false}, {
control = 0x0, freeit = false}}, msgid = 147, treq = 0x0, ldap_conn = 0x7fffd9e6e5b0, ev = 0x0,
referral_urls = 0x0, referrals = {offset = 0, type = 0x0, entry = {prev = 0x0, next = 0x0},
num_elements = 0}, referral_depth = 0, referral = 0x0, parser = 0x0, result = 0x7fffd1210c10,
ret = LDAP_RESULT_SUCCESS}
(gdb) print *query->ldap_conn
$2 = {handle = 0x7fffdbf84f50, serverctrls = {{control = 0x0, freeit = false} <repeats 11 times>},
clientctrls = {{control = 0x0, freeit = false} <repeats 11 times>}, serverctrls_cnt = 0,
clientctrls_cnt = 0, directory = 0x0, config = 0x7fffd802e498, conn = 0x7fffd19cc570,
state = FR_LDAP_STATE_RUN, fd = 8, queries = 0x7fffd3235cb0, uctx = 0x0}
(gdb)
Another one crash:
ASSERT FAILED /home/ubuntu/work/viasat/freeradius-server/src/lib/unlang/interpret_synchronous.c[65]: request->parent || (request->log.unlang_indent == 0): Request 15635 bad log indentation - expected 0 got 10
CAUGHT SIGNAL: Aborted
Backtrace of last 18 frames:
/usr/lib64/freeradius/libfreeradius-util.so(fr_fault+0x1ba)[0x7f854dafc90a]
/usr/lib64/freeradius/libfreeradius-util.so(_fr_assert_fail+0xf4)[0x7f854dafcbc4]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x26284)[0x7f854d0f6284]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_request_done+0x98)[0x7f854d0f46db]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_signal+0x263)[0x7f854d0f4988]
/usr/lib64/freeradius/libfreeradius-io.so(+0x19365)[0x7f854ceb6365]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_timer_run+0x7f)[0x7f854db1934f]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_service+0x504)[0x7f854db19bd4]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_synchronous+0x228)[0x7f854d0f6b35]
/usr/lib64/freeradius/libfreeradius-ldap.so(fr_ldap_trunk_search+0x14b)[0x7f8545553e9b]
/usr/lib64/freeradius/rlm_ldap.so(rlm_ldap_find_user+0x188)[0x7f8545bf02c8]
/usr/lib64/freeradius/rlm_ldap.so(+0x77f3)[0x7f8545beb7f3]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x2b5f8)[0x7f854d0fb5f8]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret+0x683)[0x7f854d0f1eb0]
/usr/lib64/freeradius/libfreeradius-io.so(fr_worker+0x18d)[0x7f854ceb7bfd]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1563f)[0x7f854ceb263f]
/lib64/libpthread.so.0(+0x7ea5)[0x7f854b151ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f854aa03b0d]
No panic action set
For the request->log.unlang_indent
assertion, the short-term fix is to just delete it. It's a sanity check on indentation for the current request. It doesn't affect how the server operates once the request is freed.
However, it should be fixed. We'd need to know what modules are being run in that section. All of the current code checks for indentation levels being correct, and all of the tests pass. So it's difficult to tell what's different about your system.
@alandekok I think it's just looking at the free'd memory and the root cause of all ASSERT'ions here is the same. I added more logging and it seems that in case of rlm_ldap_find_user() we are using two request cancellation handlers: _ldap_search_sync_timeout() and ldap_trunk_query_cancel():
Info : _ldap_search_sync_timeout - query 0x7fffdd5ac130, treq = 0x7fffdc328390
Info : rlm_ldap - [10] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [9] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [11] Trunk connection changed state INIT -> CONNECTING
Info : rlm_ldap - [11] Trunk connection changed state ACTIVE -> CLOSED
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Info : ldap_trunk_query_cancel - query 0x7fffdd5ac130, treq = 0x7fffdc328390, action = 2
Bad talloc magic value - unknown value
talloc abort: Bad talloc magic value - unknown value
CAUGHT SIGNAL: Aborted
Backtrace of last 19 frames:
/usr/lib64/freeradius/libfreeradius-util.so(fr_fault+0x1ba)[0x7ffff7b1f90a]
/usr/lib64/freeradius/libfreeradius-util.so(+0x27acf)[0x7ffff7b1facf]
/lib64/libtalloc.so.2(_talloc_get_type_abort+0xd0)[0x7ffff655d420]
/usr/lib64/freeradius/libfreeradius-server.so(fr_trunk_request_signal_cancel+0x1e)[0x7ffff7432f9e]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x1f205)[0x7ffff7112205]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_signal+0x151)[0x7ffff7117876]
/usr/lib64/freeradius/libfreeradius-io.so(+0x19365)[0x7ffff6ed9365]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_timer_run+0x7f)[0x7ffff7b3c34f]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_service+0x504)[0x7ffff7b3cbd4]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_synchronous+0x228)[0x7ffff7119b35]
/usr/lib64/freeradius/libfreeradius-ldap.so(fr_ldap_trunk_search+0x14b)[0x7fffeb54ce9b]
/usr/lib64/freeradius/rlm_ldap.so(rlm_ldap_find_user+0x188)[0x7fffebbe92c8]
/usr/lib64/freeradius/rlm_ldap.so(+0x77f3)[0x7fffebbe47f3]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x2b5f8)[0x7ffff711e5f8]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret+0x683)[0x7ffff7114eb0]
/usr/lib64/freeradius/libfreeradius-io.so(fr_worker+0x18d)[0x7ffff6edabfd]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1563f)[0x7ffff6ed563f]
/lib64/libpthread.so.0(+0x7ea5)[0x7ffff5174ea5]
/lib64/libc.so.6(clone+0x6d)[0x7ffff4a26b0d]
No panic action set
I think we should not use the latter (ldap_trunk_query_cancel) if request is not async:
action = unlang_function_push(request, is_async ? NULL : ldap_trunk_query_start, ldap_trunk_query_results,
ldap_trunk_query_cancel, is_async ? UNLANG_SUB_FRAME : UNLANG_TOP_FRAME, query);
if (action == UNLANG_ACTION_FAIL) goto error;
*out = query;
/*
* Hack until everything is async
*/
if (!is_async) {
fr_event_timer_t const *ev = NULL;
fr_time_delta_t timeout = ttrunk->config.res_timeout;
/*
* Add an event that'll send a cancellation request
* to the request.
*/
if (fr_time_delta_ispos(timeout)) {
if (fr_event_timer_in(ctx, unlang_interpret_event_list(request), &ev, timeout,
_ldap_search_sync_timeout, query) < 0) goto error;
}
For the
request->log.unlang_indent
assertion, the short-term fix is to just delete it. It's a sanity check on indentation for the current request. It doesn't affect how the server operates once the request is freed.However, it should be fixed. We'd need to know what modules are being run in that section. All of the current code checks for indentation levels being correct, and all of the tests pass. So it's difficult to tell what's different about your system.
@alandekok I realized this one is a different one (not related to ldap timeouts) - I filed a separate issue about that: https://github.com/FreeRADIUS/freeradius-server/issues/4622
What information do you need about modules and section? I can share recv handler unlang source
I believe I found the reason for these use-after-free:
in fr_ldap_trunk_search()
and fr_ldap_trunk_modify()
fr_ldap_query_t
is allocated with ctx = unlang_interpret_frame_talloc_ctx(request)
(frame context).
Then ldap_trunk_query_cancel()
can be called in this context:
#0 ldap_trunk_query_cancel (request=0x7fffcbeabb50, action=FR_SIGNAL_CANCEL, uctx=0x7fffd9f1a0a0)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:612
#1 0x00007ffff70f31e6 in frame_signal (limit=0, action=FR_SIGNAL_CANCEL, request=0x7fffcbeabb50)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:1028
#2 unlang_interpret_signal (request=request@entry=0x7fffcbeabb50, action=action@entry=FR_SIGNAL_CANCEL)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:1082
#3 0x00007ffff6eba365 in worker_stop_request (request_p=<synthetic pointer>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:419
#4 worker_max_request_time (el=<optimized out>, when=..., uctx=<optimized out>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:419
#5 0x00007ffff7b27865 in fr_event_timer_run (el=0x7fffd8004c80, when=0x7fffe760da50)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/util/event.c:2160
#6 0x00007ffff7b2823a in fr_event_service (el=0x7fffd8004c80)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/util/event.c:2487
#7 0x00007ffff70f40b6 in unlang_interpret_synchronous (el=0x7fffd8004c80, request=0x7fffcbeabb50)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret_synchronous.c:236
#8 0x00007fffeb34306b in fr_ldap_trunk_search (p_result=0x7fffe760e5dc, ctx=0x7fffcbeaed50,
out=0x7fffe760dd18, request=0x7fffcbeabb50, ttrunk=0x7fffd802e460, base_dn=0x15759d0 "dc=viasat,dc=com",
scope=2, filter=0x7fffe760dd30 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe760e608,
serverctrls=0x7fffe760dd20, clientctrls=0x0, is_async=false)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:729
#9 0x00007fffeb9e42c8 in rlm_ldap_find_user (inst=inst@entry=0x1211790,
request=request@entry=0x7fffcbeabb50, ttrunk=ttrunk@entry=0x7fffd802e460,
attrs=attrs@entry=0x7fffe760e608, force=force@entry=true, result=result@entry=0x7fffe760e5e8,
handle=0x7fffe760e5e0, rcode=0x7fffe760e5dc)
at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/user.c:118
#10 0x00007fffeb9df7f3 in mod_authorize (p_result=0x7fffcbeaed68, mctx=0x7fffe760eeb0,
request=0x7fffcbeabb50) at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/rlm_ldap.c:1166
#11 0x00007ffff70f6fd1 in unlang_module (p_result=0x7fffcbeabd1c, request=0x7fffcbeabb50,
frame=0x7fffcbeabee8) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/module.c:924
#12 0x00007ffff70e1169 in frame_eval (priority=0x7fffcbeabd18, result=0x7fffcbeabd1c, frame=0x7fffcbeabee8,
request=0x7fffcbeabb50) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:500
frame_signal()
then will call frame_cleanup()
for all the frames in stack and this will deallocate fr_ldap_query_t
.
However ldap_trunk_request_mux()
and ldap_request_cancel_mux()
can be called after this point.
ldap_trunk_request_mux()
crash:
Info : rlm_ldap - [82] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [83] Trunk connection changed state INIT -> CONNECTING
Bad talloc magic value - unknown value
talloc abort: Bad talloc magic value - unknown value
CAUGHT SIGNAL: Aborted
Backtrace of last 17 frames:
/usr/lib64/freeradius/libfreeradius-util.so(fr_fault+0x1ba)[0x7ffff7b1ebfa]
/usr/lib64/freeradius/libfreeradius-util.so(+0x27dbf)[0x7ffff7b1edbf]
/lib64/libtalloc.so.2(_talloc_get_type_abort+0xd0)[0x7ffff6573420]
/usr/lib64/freeradius/libfreeradius-ldap.so(+0xbf12)[0x7fffeb54bf12]
/usr/lib64/freeradius/libfreeradius-server.so(fr_trunk_connection_signal_writable+0x121)[0x7ffff742b151]
/usr/lib64/freeradius/libfreeradius-ldap.so(+0xbc48)[0x7fffeb54bc48]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_service+0x40e)[0x7ffff7b3c1ce]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_synchronous+0x276)[0x7ffff71290b6]
/usr/lib64/freeradius/libfreeradius-ldap.so(fr_ldap_trunk_search+0x1ed)[0x7fffeb547ff1]
/usr/lib64/freeradius/rlm_ldap.so(rlm_ldap_find_user+0x188)[0x7fffebbe92c8]
/usr/lib64/freeradius/rlm_ldap.so(+0x77f3)[0x7fffebbe47f3]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x22fd1)[0x7ffff712bfd1]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret+0x459)[0x7ffff7116169]
/usr/lib64/freeradius/libfreeradius-io.so(fr_worker+0x18d)[0x7ffff6ef0bfd]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1563f)[0x7ffff6eeb63f]
/lib64/libpthread.so.0(+0x7ea5)[0x7ffff518aea5]
/lib64/libc.so.6(clone+0x6d)[0x7ffff4a3cb0d]
No panic action set
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7841700 (LWP 25709)]
0x00007ffff51924fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
36 return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install hiredis-0.12.1-2.el7.x86_64 nwkrad-openssl-libs-1.1.1k-1.el7.x86_64
(gdb) bt
#0 0x00007ffff51924fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
#1 0x00007ffff7b1ebcb in fr_fault (sig=sig@entry=6)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/debug.c:1054
#2 0x00007ffff7b1edbf in _fr_talloc_fault (reason=<optimized out>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/debug.c:1096
#3 0x00007ffff6573420 in talloc_abort_unknown_value () at ../../talloc.c:513
#4 talloc_chunk_from_ptr (ptr=0x7fffdae082c0) at ../../talloc.c:529
#5 __talloc_get_name (ptr=0x7fffdae082c0) at ../../talloc.c:1539
#6 _talloc_get_type_abort (ptr=0x7fffdae082c0, name=0x7fffeb55fc51 "fr_ldap_query_t",
location=0x7fffeb5601a8 "/home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:646")
at ../../talloc.c:1596
#7 0x00007fffeb54bf12 in ldap_trunk_request_mux (el=0x7fffd8004c80, tconn=0x7fffdb2819e0,
conn=0x7fffc9cfb040, uctx=0x7fffd802e460)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:646
#8 0x00007ffff742b151 in trunk_connection_writable (tconn=0x7fffdb2819e0)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/server/trunk.c:2851
#9 fr_trunk_connection_signal_writable (tconn=0x7fffdb2819e0)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/server/trunk.c:3770
#10 0x00007fffeb54bc48 in ldap_conn_writable (el=0x7fffd8004c80, fd=7, flags=1, uctx=0x7fffdb2819e0)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:529
#11 0x00007ffff7b3c1ce in fr_event_service (el=0x7fffd8004c80)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/event.c:2432
#12 fr_event_service (el=el@entry=0x7fffd8004c80)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/event.c:2285
#13 0x00007ffff71290b6 in unlang_interpret_synchronous (el=0x7fffd8004c80, request=0x7fffdae12800)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret_synchronous.c:236
#14 0x00007fffeb547ff1 in fr_ldap_trunk_search (p_result=0x7fffe78141dc, ctx=0x7fffdae15a00,
out=0x7fffe7813918, request=0x7fffdae12800, ttrunk=0x7fffd802e460, base_dn=0x158a010 "dc=viasat,dc=com",
scope=2, filter=0x7fffe7813930 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe7814208,
serverctrls=0x7fffe7813920, clientctrls=0x0, is_async=false)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:728
#15 0x00007fffebbe92c8 in rlm_ldap_find_user (inst=inst@entry=0x121f4d0,
request=request@entry=0x7fffdae12800, ttrunk=ttrunk@entry=0x7fffd802e460,
attrs=attrs@entry=0x7fffe7814208, force=force@entry=true, result=result@entry=0x7fffe78141e8,
handle=0x7fffe78141e0, rcode=0x7fffe78141dc)
at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/user.c:118
#16 0x00007fffebbe47f3 in mod_authorize (p_result=0x7fffdae15a18, mctx=0x7fffe7814ab0,
request=0x7fffdae12800) at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/rlm_ldap.c:1166
#17 0x00007ffff712bfd1 in unlang_module (p_result=0x7fffdae129cc, request=0x7fffdae12800,
frame=0x7fffdae12b98) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/module.c:924
#18 0x00007ffff7116169 in frame_eval (priority=0x7fffdae129c8, result=0x7fffdae129cc, frame=0x7fffdae12b98,
request=0x7fffdae12800) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:500
#19 unlang_interpret (request=request@entry=0x7fffdae12800)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:684
#20 0x00007ffff6ef0bfd in worker_run_request (start=..., worker=0x7fffd800d500)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1184
#21 fr_worker (worker=0x7fffd800d500) at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1369
#22 0x00007ffff6eeb63f in fr_schedule_worker_thread (arg=<optimized out>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/schedule.c:242
#23 0x00007ffff518aea5 in start_thread (arg=0x7fffe7841700) at pthread_create.c:307
#24 0x00007ffff4a3cb0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)
ldap_request_cancel_mux()
crash:
Bad talloc magic value - unknown value
talloc abort: Bad talloc magic value - unknown value
CAUGHT SIGNAL: Aborted
Backtrace of last 17 frames:
/usr/lib64/freeradius/libfreeradius-util.so(fr_fault+0x1ba)[0x7ffff7b1ebfa]
/usr/lib64/freeradius/libfreeradius-util.so(+0x27dbf)[0x7ffff7b1edbf]
/lib64/libtalloc.so.2(_talloc_get_type_abort+0xd0)[0x7ffff6573420]
/usr/lib64/freeradius/libfreeradius-ldap.so(+0xbba1)[0x7fffeb54cba1]
/usr/lib64/freeradius/libfreeradius-server.so(fr_trunk_connection_signal_writable+0xc5)[0x7ffff742b0f5]
/usr/lib64/freeradius/libfreeradius-ldap.so(+0xbc84)[0x7fffeb54cc84]
/usr/lib64/freeradius/libfreeradius-util.so(fr_event_service+0x40e)[0x7ffff7b3c1ce]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret_synchronous+0x276)[0x7ffff71290b6]
/usr/lib64/freeradius/libfreeradius-ldap.so(fr_ldap_trunk_search+0x1dd)[0x7fffeb549004]
/usr/lib64/freeradius/rlm_ldap.so(rlm_ldap_find_user+0x188)[0x7fffebbe92c8]
/usr/lib64/freeradius/rlm_ldap.so(+0x77f3)[0x7fffebbe47f3]
/usr/lib64/freeradius/libfreeradius-unlang.so(+0x22fd1)[0x7ffff712bfd1]
/usr/lib64/freeradius/libfreeradius-unlang.so(unlang_interpret+0x459)[0x7ffff7116169]
/usr/lib64/freeradius/libfreeradius-io.so(fr_worker+0x18d)[0x7ffff6ef0bfd]
/usr/lib64/freeradius/libfreeradius-io.so(+0x1563f)[0x7ffff6eeb63f]
/lib64/libpthread.so.0(+0x7ea5)[0x7ffff517aea5]
/lib64/libc.so.6(clone+0x6d)[0x7ffff4a2cb0d]
No panic action set
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7041700 (LWP 17182)]
0x00007ffff51824fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
36 return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install hiredis-0.12.1-2.el7.x86_64 nwkrad-openssl-libs-1.1.1k-1.el7.x86_64
(gdb) bt
#0 0x00007ffff51824fb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:36
#1 0x00007ffff7b1ebcb in fr_fault (sig=sig@entry=6)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/debug.c:1054
#2 0x00007ffff7b1edbf in _fr_talloc_fault (reason=<optimized out>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/debug.c:1096
#3 0x00007ffff6573420 in talloc_abort_unknown_value () at ../../talloc.c:513
#4 talloc_chunk_from_ptr (ptr=0x7fffd2fa48b0) at ../../talloc.c:529
#5 __talloc_get_name (ptr=0x7fffd2fa48b0) at ../../talloc.c:1539
#6 _talloc_get_type_abort (ptr=0x7fffd2fa48b0, name=0x7fffeb560bd1 "fr_ldap_query_t",
location=0x7fffeb560e58 "/home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:486")
at ../../talloc.c:1596
#7 0x00007fffeb54cba1 in ldap_request_cancel_mux (el=0x7fffdc004c80, tconn=0x7fffdea7c5e0,
conn=0x7fffd2645320, uctx=0x7fffdc02e460)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:486
#8 0x00007ffff742b0f5 in trunk_connection_writable (tconn=0x7fffdea7c5e0)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/server/trunk.c:2846
#9 fr_trunk_connection_signal_writable (tconn=0x7fffdea7c5e0)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/server/trunk.c:3770
#10 0x00007fffeb54cc84 in ldap_conn_writable (el=0x7fffdc004c80, fd=48, flags=1, uctx=0x7fffdea7c5e0)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/connection.c:529
#11 0x00007ffff7b3c1ce in fr_event_service (el=0x7fffdc004c80)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/event.c:2432
#12 fr_event_service (el=el@entry=0x7fffdc004c80)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/util/event.c:2285
#13 0x00007ffff71290b6 in unlang_interpret_synchronous (el=0x7fffdc004c80, request=0x7fffde90a840)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret_synchronous.c:236
#14 0x00007fffeb549004 in fr_ldap_trunk_search (p_result=0x7fffe70135dc, ctx=0x7fffde90da40,
out=0x7fffe7012d18, request=0x7fffde90a840, ttrunk=0x7fffdc02e460, base_dn=0x1589fe0 "dc=viasat,dc=com",
scope=2, filter=0x7fffe7012d30 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe7013608,
serverctrls=0x7fffe7012d20, clientctrls=0x0, is_async=false)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:727
#15 0x00007fffebbe92c8 in rlm_ldap_find_user (inst=inst@entry=0x121f4a0,
request=request@entry=0x7fffde90a840, ttrunk=ttrunk@entry=0x7fffdc02e460,
attrs=attrs@entry=0x7fffe7013608, force=force@entry=true, result=result@entry=0x7fffe70135e8,
handle=0x7fffe70135e0, rcode=0x7fffe70135dc)
at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/user.c:118
#16 0x00007fffebbe47f3 in mod_authorize (p_result=0x7fffde90da58, mctx=0x7fffe7013eb0,
request=0x7fffde90a840) at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/rlm_ldap.c:1166
#17 0x00007ffff712bfd1 in unlang_module (p_result=0x7fffde90aa0c, request=0x7fffde90a840,
frame=0x7fffde90abd8) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/module.c:924
#18 0x00007ffff7116169 in frame_eval (priority=0x7fffde90aa08, result=0x7fffde90aa0c, frame=0x7fffde90abd8,
request=0x7fffde90a840) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:500
#19 unlang_interpret (request=request@entry=0x7fffde90a840)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:684
#20 0x00007ffff6ef0bfd in worker_run_request (start=..., worker=0x7fffdc00d500)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1184
#21 fr_worker (worker=0x7fffdc00d500) at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1369
#22 0x00007ffff6eeb63f in fr_schedule_worker_thread (arg=<optimized out>)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/schedule.c:242
#23 0x00007ffff517aea5 in start_thread (arg=0x7fffe7041700) at pthread_create.c:307
#24 0x00007ffff4a2cb0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)
Unfortunately, we can't execute the cancellation synchronously because we're waiting on the write I/O handler for the socket to be called.
Reparenting the fr_ldap_query_t
is probably a bad idea, because although there probably won't be the performance penalties you'd usually have with reparenting (It's O(N) where N is the number of chunk siblings), we would leave a big chunk of the talloc pool allocated to the unlang stack reserved. As the request_t
s go back in a free list, the next network request to use that request_t
would start allocating memory in the pool past the point where the ldap_query_t
was allocated from. We wouldn't leak memory, but it'd significantly bloat it, and if we're already experiencing cancellations we're probably already dealing with thousands of requests in a semi-stuck state.
I think the best thing to do here, is create a common structure with just the msgid
, and stick it at the start of the ldap_query_t
. When we need to cancel a request, we allocate one of those structures in the context of the treq
, and assign it as a new preq
. Cancellation code then just casts the preq
to the common structure and everyone's happy. The talloc_get_type_abort
call would need to be swapped out to just an assert checking the chunk name.
An alternative would just be to dup the entire fr_ldap_query_t
but that seems wasteful...
We'd also need to immediately remove the fr_ldap_query_t
from ldap_conn->queries
, not sure how that's done at the moment.
@ndptech what do you think of the proposed fix?
@arr2036 @ndptech Re: initial bug report
There are several issues in lib/ldap:
- two cancellation handlers in
fr_ldap_trunk_search()
: why do we need to set_ldap_search_sync_timeout()
handler at all? This was added in 2a47fc01ba326431171eb1e53d64d0c009facf44.fr_ldap_trunk_modify()
does not have this code btw.
What I see (with added logging) is that first _ldap_search_sync_timeout()
is called (e.g. for treq->state == FR_TRUNK_REQUEST_STATE_BACKLOG
), which calls fr_trunk_request_signal_cancel()
which calls fr_trunk_request_free()
which overwrites *treq
(sets treq->pub.state
to FR_TRUNK_REQUEST_STATE_INIT
and treq->preq
to NULL
).
Then worker_max_request_time()
is fired which calls ldap_trunk_query_cancel()
which calls fr_trunk_request_signal_cancel()
again which dies with "invalid transition INIT -> UNASSIGNED".
If I patch this by not using second cancellation handler (e.g. not setting _ldap_search_sync_timeout()
) I get query->treq is NULL
issue as described above.
- Looks like
query->treq == NULL
issue is caused by a race condition:ldap_trunk_request_demux()
may resetquery->treq
toNULL
(I assume it has executed becausequery->ret == LDAP_RESULT_SUCCESS
)
I think we should cancel timeout events (how do I do this?) if the query was cancelled or processed successfully (this should also help with issue 1).
For now I patched this just by exiting ldap_trunk_query_cancel()
if query->treq == NULL
.
However with this patch I get use-after-free issue as described above
- Use-after-free for
fr_ldap_query_t
: this happens whenquery->treq->state == FR_TRUNK_REQUEST_STATE_SENT
whenldap_trunk_query_cancel()
is called. In this casefr_trunk_request_signal_cancel()
does not deallocatefr_trunk_request_t
but the query gets deallocated whenframe_cleanup()
cleans up the stack frames (fr_ldap_query_t
is allocated in a context of a stack frame), so then whenldap_request_cancel_mux()
is called,treq->preq
points to freed memory.
If I patch this one by using fr_trunk
talloc context (the same lifetime as fr_trunk_request_t
) I still get use-after-free in ldap_request_cancel_mux()
- This use-after-free in
ldap_request_cancel_mux()
occurs becausetreq
is enteredFR_TRUNK_REQUEST_STATE_CANCEL
inldap_trunk_query_cancel()
and placed totconn->cancel
. Thenldap_request_cancel_mux()
gets thistreq
fromtconn->cancel
and callsfr_trunk_request_signal_cancel_complete()
. Butfr_trunk_request_signal_cancel_complete()
callstrunk_request_remove_from_conn()
(viatrunk_request_enter_cancel_complete()
) only fortreq->state == FR_TRUNK_REQUEST_STATE_CANCEL_SENT
. So thistreq
is not removed fromtconn->cancel
andfr_trunk_connection_pop_cancellation()
returns the samefr_trunk_request_t
on the nextwhile
iteration inldap_request_cancel_mux()
but itstreq->preq
is already free'd on previous iteration.
I should note that with the patches from #4628 I no longer see the crashes and aborts. But after several million requests with LDAP authorization all the workers got stuck and do not process incoming requests.
I do not have analysis for this yet. Some observations are:
...
Info : rlm_ldap - [377] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [379] Trunk connection changed state INIT -> CONNECTING
Info : global - ldap_trunk_query_cancel: query 0x7fff1095d050, treq = 0x7fffdd8cfe30, action = 2
Info : global - ldap_trunk_query_cancel: query 0x7fff1095d050, treq = 0x7fffdd8cfe30, treq->state = 16
Info : rlm_ldap - [378] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [379] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [380] Trunk connection changed state INIT -> CONNECTING
Info : rlm_ldap - [380] Trunk connection changed state ACTIVE -> CLOSED
Info : rlm_ldap - [381] Trunk connection changed state INIT -> CONNECTING
Info : rlm_ldap - [381] Trunk connection changed state ACTIVE -> CLOSED
<--- this is the point workers stopped processing incoming requests
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
Error : Network 0 - max_outstanding reached - dropping packet: No memory available in ring buffer
If after this I set fr_debug_lvl=2 and send a single request I get this log:
Fri Jul 22 20:28:28 2022: Debug : Main loop waking up in 0.999995504 seconds
Fri Jul 22 20:28:29 2022: Debug : Main loop waking up in 0.999995312 seconds
Fri Jul 22 20:28:29 2022: Debug : Network 0 - 1 event(s) pending
Fri Jul 22 20:28:29 2022: Debug : Network 0 - Reading data from FD 42
Fri Jul 22 20:28:29 2022: Debug : proto_dhcpv4_udp - Received Discover XID 30bcceb1 length 326 dhcpv4_udp serv
er 127.0.0.1 port 6767 on interface lo
Fri Jul 22 20:28:29 2022: Debug : Network 0 - Read 326 byte(s) from FD 42
Fri Jul 22 20:28:29 2022: Debug : Network 0 - Gathering events - will wait
Fri Jul 22 20:28:29 2022: Debug : 1 event(s) pending
Fri Jul 22 20:28:29 2022: Debug : Worker 1 - Channel data-to-responder
Fri Jul 22 20:28:29 2022: Debug : Worker 1 - Received request 1564028
Fri Jul 22 20:28:29 2022: Debug : (2163624) proto_dhcpv4 decode packet
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0000: 01 01 06 00 30 bc ce b1 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0010: 00 00 00 00 00 00 00 00 00 00 00 00 01 01 bc 11
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0020: 22 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 63 82 53 63
Fri Jul 22 20:28:29 2022: Debug : (2163624) 00f0: 35 01 01 3d 06 01 01 bc 11 22 33 37 05 2b 29 01
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0100: 03 06 39 02 05 dc 3c 09 76 69 61 73 61 74 31 2e
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0110: 30 52 31 06 1f 30 31 30 31 42 43 31 31 32 32 33
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0120: 33 40 61 75 74 2e 72 65 73 2e 76 69 61 73 61 74
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0130: 2e 63 6f 6d 02 06 01 01 bc 11 22 33 01 06 00 a0
Fri Jul 22 20:28:29 2022: Debug : (2163624) 0140: bc 33 22 11 ff 00
Fri Jul 22 20:28:29 2022: Debug : Worker 1 - Resetting cleanup timer to +30
Fri Jul 22 20:28:29 2022: Debug : No pending requests (1 yielded)
Fri Jul 22 20:28:29 2022: Debug : Gathering events - will wait
Fri Jul 22 20:28:29 2022: Debug : 1 event(s) pending
Fri Jul 22 20:28:29 2022: ERROR : ERROR : (2163623) Request has reached max_request_time - signalling it to s
top
Fri Jul 22 20:28:29 2022: Debug : (2163623) Done request
Fri Jul 22 20:28:29 2022: Debug : (2163623) Finished request
Fri Jul 22 20:28:29 2022: Debug : No pending requests (1 yielded)
Fri Jul 22 20:28:29 2022: Debug : Gathering events - will wait
Fri Jul 22 20:28:29 2022: Debug : Network 0 - 1 event(s) pending
Fri Jul 22 20:28:29 2022: Debug : Network 0 - Channel data-to-requestor
Fri Jul 22 20:28:29 2022: Debug : proto_dhcpv4_udp - cleaning up request in 5.000000s
Fri Jul 22 20:28:29 2022: Debug : Network 0 - Gathering events - will wait
Fri Jul 22 20:28:30 2022: Debug : Main loop waking up in 0.999994897 seconds
Fri Jul 22 20:28:31 2022: Debug : Main loop waking up in 0.999995215 seconds
Fri Jul 22 20:28:32 2022: Debug : Main loop waking up in 0.999995753 seconds
Fri Jul 22 20:28:33 2022: Debug : Main loop waking up in 0.999995094 seconds
Fri Jul 22 20:28:34 2022: Debug : Main loop waking up in 0.999995212 seconds
Fri Jul 22 20:28:34 2022: Debug : Network 0 - 1 event(s) pending
Fri Jul 22 20:28:34 2022: Debug : TIMER - proto_dhcpv4_udp - cleanup delay
Fri Jul 22 20:28:34 2022: Debug : Network 0 - Gathering events - will wait
Fri Jul 22 20:28:35 2022: Debug : Main loop waking up in 0.999995259 seconds
Fri Jul 22 20:28:36 2022: Debug : Main loop waking up in 0.999995316 seconds
Some gdb output:
(gdb) info thr [115/98469]
Id Target Id Frame
8 Thread 0x7fffe6e3c700 (LWP 20906) "radiusd" 0x00007ffff49ece9f in __GI_ppoll (fds=0x7fffe6e0d8f4,
nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
7 Thread 0x7fffe763d700 (LWP 20905) "radiusd" 0x00007ffff49f80e3 in epoll_wait ()
at ../sysdeps/unix/syscall-template.S:81
6 Thread 0x7fffe7e3e700 (LWP 20904) "radiusd" 0x00007ffff49f80e3 in epoll_wait ()
at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x7ffff1e3a700 (LWP 20903) "libkqueue_mon" 0x00007ffff493058a in do_sigwaitinfo (
info=0x7ffff1e0d010, set=0x7ffff1e0cf90) at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:54
4 Thread 0x7ffff1639700 (LWP 20902) "libkqueue_wait" 0x00007ffff493058a in do_sigwaitinfo (
info=0x7ffff160bfe0, set=0x7ffff160bf60) at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:54
* 1 Thread 0x7ffff7fdf580 (LWP 20895) "radiusd" 0x00007ffff49ece9f in __GI_ppoll (fds=0x7fffffffe1e4,
nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
(gdb) thr 8
[Switching to thread 8 (Thread 0x7fffe6e3c700 (LWP 20906))]
#0 0x00007ffff49ece9f in __GI_ppoll (fds=0x7fffe6e0d8f4, nfds=1, timeout=<optimized out>, sigmask=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:56
56 result = INLINE_SYSCALL (ppoll, 5, fds, nfds, timeout, sigmask,
(gdb) bt
#0 0x00007ffff49ece9f in __GI_ppoll (fds=0x7fffe6e0d8f4, nfds=1, timeout=<optimized out>, sigmask=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1 0x00007ffff6323021 in linux_kevent_wait_hires (kq=0x7fffdc00b220, timeout=0x7fffe6e0dab0)
at /usr/src/debug/libkqueue-2.6.1/src_0/linux/platform.c:635
#2 0x00007ffff632317f in linux_kevent_wait (kq=0x7fffdc00b220, nevents=256, ts=0x7fffe6e0dab0)
at /usr/src/debug/libkqueue-2.6.1/src_0/linux/platform.c:668
#3 0x00007ffff631d145 in kevent (kqfd=36, changelist=0x7ffff65353a0 <null_kev>, nchanges=0,
eventlist=0x7fffdc004d58, nevents=256, timeout=0x7fffe6e0dab0)
at /usr/src/debug/libkqueue-2.6.1/src_0/common/kevent.c:445
#4 0x00007ffff7b2788f in fr_event_corral (el=0x7fffdc004c80, now=..., wait=true)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/util/event.c:2245
#5 0x00007ffff70f4088 in unlang_interpret_synchronous (el=0x7fffdc004c80, request=0x7fff118c8470)
at src/freeradius-devel/util/time.h:143
#6 0x00007fffeb343f35 in fr_ldap_trunk_search (p_result=0x7fffe6e0e5dc, ctx=0x7fff118cb670,
out=0x7fffe6e0dd18, request=0x7fff118c8470, ttrunk=0x7fffdc02e460, base_dn=0x1575990 "dc=viasat,dc=com",
scope=2, filter=0x7fffe6e0dd30 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe6e0e608,
serverctrls=0x7fffe6e0dd20, clientctrls=0x0, is_async=false)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:705
#7 0x00007fffeb9e42c8 in rlm_ldap_find_user (inst=inst@entry=0x1211750,
request=request@entry=0x7fff118c8470, ttrunk=ttrunk@entry=0x7fffdc02e460,
attrs=attrs@entry=0x7fffe6e0e608, force=force@entry=true, result=result@entry=0x7fffe6e0e5e8,
handle=0x7fffe6e0e5e0, rcode=0x7fffe6e0e5dc)
at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/user.c:118
#8 0x00007fffeb9df7f3 in mod_authorize (p_result=0x7fff118cb688, mctx=0x7fffe6e0eeb0,
request=0x7fff118c8470) at /usr/src/debug/freeradius-server-4.0.0/src/modules/rlm_ldap/rlm_ldap.c:1166
#9 0x00007ffff70f6fd1 in unlang_module (p_result=0x7fff118c863c, request=0x7fff118c8470,
frame=0x7fff118c8808) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/module.c:924
#10 0x00007ffff70e1169 in frame_eval (priority=0x7fff118c8638, result=0x7fff118c863c, frame=0x7fff118c8808,
request=0x7fff118c8470) at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:500
#11 unlang_interpret (request=request@entry=0x7fff118c8470)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/unlang/interpret.c:684
#12 0x00007ffff6ebbbfd in worker_run_request (start=..., worker=0x7fffdc00d500)
at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1184
#13 fr_worker (worker=0x7fffdc00d500) at /usr/src/debug/freeradius-server-4.0.0/src/lib/io/worker.c:1369
#14 0x00007ffff6eb663f in fr_schedule_worker_thread (arg=<optimized out>)
---Type <return> to continue, or q <return> to quit---q
at /usr/src/debug/freeradius-server-Quit
(gdb) fr 6
#6 0x00007fffeb343f35 in fr_ldap_trunk_search (p_result=0x7fffe6e0e5dc, ctx=0x7fff118cb670,
out=0x7fffe6e0dd18, request=0x7fff118c8470, ttrunk=0x7fffdc02e460, base_dn=0x1575990 "dc=viasat,dc=com",
scope=2, filter=0x7fffe6e0dd30 "(cn=1,6,00:a0:bc:6c:7d:3a)", attrs=0x7fffe6e0e608,
serverctrls=0x7fffe6e0dd20, clientctrls=0x0, is_async=false)
at /home/ubuntu/work/viasat/freeradius-server/src/lib/ldap/base.c:705
(gdb) print *ttrunk
$15 = {node = {left = 0x4242424242424242, right = 0x4242424242424242, parent = 0x4242424242424242,
data = 0x4242424242424242, colour = (unknown: 1111638594), being_freed = 66},
uri = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
bind_dn = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>, config = {
name = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
server = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>, server_str = 0x4242424242424242,
port = 16962, admin_identity = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
admin_password = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>, admin_sasl = {
mech = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
proxy = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
realm = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>},
sasl_secprops = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>, dereference = 1111638594,
dereference_str = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>, chase_referrals = 66,
chase_referrals_unset = 66, use_referral_credentials = 66, referral_depth = 16962, rebind = 66,
tls_mode = 1111638594, start_tls = 66,
tls_ca_file = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_ca_path = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_certificate_file = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_private_key_file = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_require_cert_str = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_require_cert = 1111638594,
tls_min_version_str = 0x4242424242424242 <Address 0x4242424242424242 out of bounds>,
tls_min_version = 1111638594, keepalive_idle = {value = 4774451407313060418},
keepalive_probes = 1111638594, keepalive_interval = {value = 4774451407313060418}, srv_timelimit = {
value = 4774451407313060418}, res_timeout = {value = 4774451407313060418}, net_timeout = {
value = 4774451407313060418}, tls_handshake_timeout = {value = 4774451407313060418},
reconnection_delay = {value = 4774451407313060418}, idle_timeout = {value = 4774451407313060418}},
directory = 0x4242424242424242, trunk = 0x4242424242424242, t = 0x4242424242424242,
ev = 0x4242424242424242}
(gdb)
So worker thread is still doing fr_ldap_trunk_search()
but its ttrunk
has already been free'd.
This is the issue the timeout event was added to prevent. I was seeing exactly the same deadlocks on a production server.
This is the issue the timeout event was added to prevent. I was seeing exactly the same deadlocks on a production server.
@arr2036 which issue? Seems that the link is omitted
@arr2036 if I am not mistaken most of the issues here are fixed now except use-after-free for fr_ldap_query_t
.
If I am not mistaken you mentioned you will provide a better fix for that (than changing allocation context - the patch we are currently using in our staging to fix that issue, but it is not 100% correct).
Do you have any updates on this?