freeradius-server
freeradius-server copied to clipboard
[defect]: Crash of v3.2.x with TLS-PSK enabled home_server
What type of defect/bug is this?
Crash or memory corruption (segv, abort, etc...)
How can the issue be reproduced?
Minimal configuration in which the crash happens is:
home_server tls {
ipaddr = 127.0.0.1
port = 2084
type = auth
proto = tcp
status_check = none
nonblock = yes
tls {
psk_identity = "myidentity"
psk_hexphrase = "00000000000000000000000000000000"
tls_min_version = "1.3"
tls_max_version = "1.3"
fragment_size = 8192
cipher_list = "DEFAULT"
connect_timeout = 30
}
}
home_server_pool tls {
type = fail-over
home_server = tls
}
realm tls.test {
auth_pool = tls
}
When I send a request (any auth, regardless of it coming in via RADIUS or RadSec or TLS-PSK) for a @tls.test user it gets authenticated when everything aligns (TLS-PSK works, TLS version, identity and psk). My peer is radsecproxy. If the psk_identity is wrong (just any other string), or the psk_hexphrase, or I enable tls_min_version = "1.2" (which radsecproxy doesn't support if I'm not mistaken) or the radsecproxy peer is configured with tls context, FreeRADIUS crashes. Backtrace provided.
Log output from the FreeRADIUS daemon
(0) suffix: Looking up realm "tls.test" for User-Name = "[email protected]"
(0) suffix: Found realm "tls.test"
Waking up in 0.3 seconds.
(0) suffix: Adding Stripped-User-Name = "test"
(0) suffix: Adding Realm = "tls.test"
(0) suffix: Proxying request from user test to realm tls.test
(0) suffix: Preparing to proxy authentication request to realm "tls.test"
(0) [suffix] = updated
(0) eap: No EAP-Message, not doing EAP
(0) [eap] = noop
(0) [files] = noop
(0) [expiration] = noop
(0) [logintime] = noop
(0) [pap] = noop
(0) } # authorize = updated
(0) Starting proxy to home server 127.0.0.1 port 2084
(0) server default {
(0) }
(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (127.0.0.1, 2084)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.3 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - recv TLS 1.3 Alert, fatal decode_error
(0) (TLS) RADIUS/TLS - The client is informing us that there is a failure inside the TLS protocol exchange.
(0) ERROR: (TLS) RADIUS/TLS - Alert read:fatal:decode error
(0) (TLS) RADIUS/TLS - Client : Need to read more data: error
tls: (TLS) Failed in connecting TLS session.: error:0A00041A:SSL routines::tlsv1 alert decode error
(TLS) Failed opening connection on proxy socket 'proxy (0.0.0.0, 0) -> home_server (127.0.0.1, 2084)'
(0) Failed to insert request into the proxy list
(0) There was no response configured: rejecting request
(0) Using Post-Auth-Type Reject
(0) # Executing group from file /opt/freeradius/etc/raddb/sites-enabled/default
(0) Post-Auth-Type REJECT {
(0) attr_filter.access_reject: EXPAND %{User-Name}
(0) attr_filter.access_reject: --> [email protected]
(0) attr_filter.access_reject: Matched entry DEFAULT at line 11
(0) [attr_filter.access_reject] = updated
(0) [eap] = noop
(0) policy remove_reply_message_if_eap {
(0) if (&reply:EAP-Message && &reply:Reply-Message) {
(0) if (&reply:EAP-Message && &reply:Reply-Message) -> FALSE
(0) else {
(0) [noop] = noop
(0) } # else = noop
(0) } # policy remove_reply_message_if_eap = noop
(0) } # Post-Auth-Type REJECT = updated
(0) Sent Access-Reject Id 0 from 0.0.0.0:2083 to 145.100.181.186:63560 length 20
(0) (TLS) RADIUS/TLS - send TLS 1.3 Handshake, Finished
(0) Finished request
Thread 2 waiting to be assigned a request
Waking up in 4.6 seconds.
Marking home server 127.0.0.1 port 2084 as dead.
Bad talloc magic value - unknown value
talloc abort: Bad talloc magic value - unknown value
Backtrace of last 16 frames:
/opt/freeradius/lib/libfreeradius-radius.so(+0xd04c)[0x7ffff7f4204c]
/lib/x86_64-linux-gnu/libtalloc.so.2(+0x3b08)[0x7ffff79f6b08]
/opt/freeradius/sbin/radiusd(+0x4f21b)[0x5555555a321b]
/opt/freeradius/lib/libfreeradius-radius.so(+0x2b0e1)[0x7ffff7f600e1]
/opt/freeradius/lib/libfreeradius-radius.so(rbtree_walk+0xfa)[0x7ffff7f602b5]
/opt/freeradius/sbin/radiusd(mark_home_server_dead+0x114)[0x5555555a33a8]
/opt/freeradius/sbin/radiusd(+0x47407)[0x55555559b407]
/opt/freeradius/sbin/radiusd(+0x48520)[0x55555559c520]
/opt/freeradius/sbin/radiusd(+0x46e51)[0x55555559ae51]
/opt/freeradius/lib/libfreeradius-radius.so(fr_event_run+0x147)[0x7ffff7f69f78]
/opt/freeradius/lib/libfreeradius-radius.so(fr_event_loop+0x452)[0x7ffff7f6aa1a]
/opt/freeradius/sbin/radiusd(radius_event_process+0x26)[0x5555555a8148]
/opt/freeradius/sbin/radiusd(main+0xd04)[0x55555558fa30]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7ffff77fd24a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7ffff77fd305]
/opt/freeradius/sbin/radiusd(_start+0x21)[0x5555555696f1]
Thread 1 "radiusd" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
Relevant log output from client utilities
No response
Backtrace from LLDB or GDB
(gdb) bt full
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x00007ffff7860e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
No locals.
#2 0x00007ffff7811fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00007ffff77fc472 in __GI_abort () at ./stdlib/abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0 <repeats 13 times>, 140737488345824, 16230630494067845376, 140737488348392}}, sa_flags = -140525456,
sa_restorer = 0x7fffffffdef0}
#4 0x00007ffff7f4208f in _fr_talloc_fault_simple (reason=0x7ffff79fc070 "Bad talloc magic value - unknown value") at src/lib/debug.c:850
No locals.
#5 0x00007ffff79f6b08 in ?? () from /lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#6 0x00005555555a321b in eol_home_listener (ctx=0x0, data=0x7fffec00c640) at src/main/process.c:4188
this = 0x7ffff786ee8f <__GI___libc_free+111>
#7 0x00007ffff7f600e1 in walk_delete_order (tree=0x5555558eeff0, compare=0x5555555a31ee <eol_home_listener>, context=0x0) at src/lib/rbtree.c:657
solid = 0x7ffff7f802e0 <sentinel>
x = 0x7fffec00c9f0
rcode = 0
#8 0x00007ffff7f602b5 in rbtree_walk (tree=0x5555558eeff0, order=RBTREE_DELETE_ORDER, compare=0x5555555a31ee <eol_home_listener>, context=0x0) at src/lib/rbtree.c:717
rcode = 0
#9 0x00005555555a33a8 in mark_home_server_dead (home=0x5555557edf60, when=0x7fffffffe0e0, down=false) at src/main/process.c:4239
previous_state = 5
buffer = "127.0.0.1\000\372\367\377\177\000\000\000\340\377\377\377\177\000\000b\246\373\367\377\177\000\000\240\246\233UUU\000\000\260\227\233UUU\000\000\000\000\000\000\256\002\000\000BP]UUU\000\000\260\227\233UUU\000\000\260\227\233UUU\000\000\000\000\000\000\000\000\000\000\200K}\367\377\177\000\000\260\340\377\377\377\177\000\000\337y\372\367\377\177\000\000`\340\377\377\377\177\000\000\260\227\233UUU\000"
#10 0x000055555559b407 in request_done (request=0x5555559b97b0, original=10) at src/main/process.c:740
home = 0x5555557edf60
i = 0
now = {tv_sec = 1718806577, tv_usec = 757183}
when = {tv_sec = 140737488347408, tv_usec = 140737353775583}
action = 10
#11 0x000055555559c520 in request_cleanup_delay (request=0x5555559b97b0, action=4) at src/main/process.c:1265
when = {tv_sec = 1718806577, tv_usec = 752120}
now = {tv_sec = 1718806577, tv_usec = 757183}
__FUNCTION__ = "request_cleanup_delay"
#12 0x000055555559ae51 in request_timer (ctx=0x5555559b97b0) at src/main/process.c:585
request = 0x5555559b97b0
action = 4
#13 0x00007ffff7f69f78 in fr_event_run (el=0x5555558f6ee0, when=0x7fffffffe2f0) at src/lib/event.c:319
callback = 0x55555559adfe <request_timer>
ctx = 0x5555559b97b0
ev = 0x5555559d6810
#14 0x00007ffff7f6aa1a in fr_event_loop (el=0x5555558f6ee0) at src/lib/event.c:689
i = 10
rcode = 0
when = {tv_sec = 1718806577, tv_usec = 757183}
--Type <RET> for more, q to quit, c to continue without paging--
wake = 0x7fffffffe2f0
read_fds = {fds_bits = {0 <repeats 16 times>}}
write_fds = {fds_bits = {0 <repeats 16 times>}}
#15 0x00005555555a8148 in radius_event_process () at src/main/process.c:6494
No locals.
#16 0x000055555558fa30 in main (argc=3, argv=0x7fffffffe4f8) at src/main/radiusd.c:641
rcode = 0
status = 0
argval = -1
spawn_flag = true
display_version = false
flag = 0
from_child = {-1, -1}
p = 0x7fffffffe75c "/radiusd"
state = 0x5555555ef2a0 <global_state>
autofree = 0x5555555f0300
This should be resolved by the latest commit on the 3.2 branch.
Thanks; I later realized this was the same crash as I found with RADIUS/TLS certificate-based home_servers (without TLS-PSK).
The crash seem to be resolved for both scenarios, but there's still something odd.
Before your fixes, I found that b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9 with the !this->nonblock on line 3596 replaced by false also gave me a stable result for RADIUS/TLS, happily proxying millions of requests per hour:
https://github.com/FreeRADIUS/freeradius-server/blob/b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9/src/main/listen.c#L3596
... and in fact there is a peer where this commit+patch does connect, but your fixes up to fb9693faaca028f61ddbf3119c89dbe25c69d8fb still don't connect. I don't control the peer so I cannot see what's on the other side,
This one does work:
(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (192.168.1.78, 2083)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - tls_new_client_session WANT_READ
Listening on proxy (10.0.0.34, 35993) -> home_server (192.168.1.78, 2083)
(0) Proxying request to home server 192.168.1.78 port 2083 (TLS) timeout 4.000000
Waking up in 0.3 seconds.
(0) Sent Access-Request Id 241 from 10.0.0.34:35993 to 192.168.1.78:2083 length 139
...
(0) Proxy-State = 0x30
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
Thread 1 waiting to be assigned a request
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - recv TLS 1.2 Handshake, ServerHello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS read server hello
Waking up in 0.3 seconds.
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS read server hello
(0) (TLS) RADIUS/TLS - recv TLS 1.2 Handshake, Certificate
(0) (TLS) RADIUS/TLS - Creating attributes from 2 certificate in chain
(0) TLS-Cert-Serial := "01"
...
and so on, but this (on fb9693faaca028f61ddbf3119c89dbe25c69d8fb) doesn't connect:
(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (192.168.1.78, 2083)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - tls_new_client_session WANT_WRITE
(0) Proxying request to home server 192.168.1.78 port 2083 (TLS) timeout 4.000000
(0) Sent Access-Request Id 184 from 110.0.0.34:41391 to 192.168.1.78:2083 length 139
...
(0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
...
(0) ERROR: Failing proxied request for user "[email protected]", due to lack of any response from home server 192.168.1.78 port 2083
...
(0) Sent Access-Reject Id 0 from 10.0.0.34:1645 to 172.0.7.129:62565 length 38
(0) Finished request
Thread 3 waiting to be assigned a request
Waking up in 4.6 seconds.
(0) Cleaning up request packet ID 0 with timestamp +4 due to cleanup_delay was reached
Waking up in 591.4 seconds.
... after a litle while
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
tls: (TLS) System call (I/O) error (-1)
(TLS) Closing connection
... shutting down socket proxy (10.0.0.34, 41391) -> home_server (192.168.1.78, 2083) (1 of 128)
... cleaning up socket proxy (10.0.0.34, 41391) -> home_server (192.168.1.78, 2083)
Closing TLS socket to home server
Ready to process requests
So, maybe this is caused by one of the other commits between b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9 and fb9693faaca028f61ddbf3119c89dbe25c69d8fb?
Hi, it seems to fail already when I cherry-pick f440863 on top of b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9, also when adding fb9693f
To be clear; it connects to a lot of other peers. I happen to know this peer is FreeRADIUS, but I don't know what version. And it's annoying, that https://github.com/FreeRADIUS/freeradius-server/commit/b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9 still works but only if I change the !this->nonblock to false (that made it stable overall)
Yep, this in fact makes it connect happily again - but exactly how I got b9af38ad700a7f5f25f98ef9cbb979b6ad3071e9 already stable.
diff --git a/src/main/listen.c b/src/main/listen.c
index 7456f2cf3c..1de27b5233 100644
--- a/src/main/listen.c
+++ b/src/main/listen.c
@@ -3597,7 +3597,7 @@ rad_listen_t *proxy_new_listener(TALLOC_CTX *ctx, home_server_t *home, uint16_t
this->fd = fr_socket_client_tcp(&home->src_ipaddr,
&home->ipaddr, home->port,
#ifdef WITH_TLS
- this->nonblock
+ false
#else
false
#endif
I think this issue is fixed now. We're spent a lot of time in 2024 cleaning up all of the TLS things :(