freeradius-server
freeradius-server copied to clipboard
cbtls_info() to report "Need to write more data" when called on exit …
…of a handshake function
The current code does seem to miss to cater for the cases when a TLS handshake exits and there is data left that needs either to be extracted from the BIO and/or sent to the other side. Differentiating and logging such cases within cbtls_info() might aid in further debugging any stalled handshakes followed up by timing out requests we experience with the up to date 3.2.x head.
It may also be useful to check for SSL_want_write() in src/main/tls_listen.c A sample patch is attached.
It may also be good to apply a similar patch to proxy_tls_read() or `proxy_tls_send()
Thanks for having a look at this. I don't seem to get more info with patches applied but the server seems to stall some time after attempting a RadSec connection to a home_server. I'm providing more info from the previous non-patched run. This is an "SSL_connect() returned WANT_WRITE" with "(TLS) has 0 bytes in the buffer" situation. TLS handshake magic?
AFAIU, for a BIO_read() to be called from tls_handshake_send() there currently have to be some un-encrypted data in 'clean_in'. The condition in tls_handshake_send() looks like this: if (ssn->clean_in.used > 0) {
excerpt from a debug log follows:
... Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to allocate ID (0/2) Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to open a new listener to the home server Tue Jun 18 10:40:13 2024 : Debug: (TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (a.b.c.d, 2083) Tue Jun 18 10:40:13 2024 : Debug: Requiring Server certificate Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - before SSL initialization (0) Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - Client before SSL initialization (0) Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello Tue Jun 18 10:40:13 2024 : Debug: Opened new proxy socket 'proxy (e.f.g.h, 60419) -> home_server (a.b.c.d, 2083)' Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: Trying to allocate ID (1/2) Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: request is now in proxy hash Tue Jun 18 10:40:13 2024 : Debug: (41) proxy: allocating destination a.b.c.d port 2083 - Id 70 Tue Jun 18 10:40:13 2024 : Debug: (41) Proxying request to home server a.b.c.d port 2083 (TLS) timeout 30.000000 Tue Jun 18 10:40:13 2024 : Debug: (41) Sent Access-Request Id 70 from e.f.g.h:60419 to a.b.c.d:2083 length 1988 ... Tue Jun 18 10:40:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello Tue Jun 18 10:40:13 2024 : Debug: (TLS) SSL_connect() returned WANT_WRITE Tue Jun 18 10:40:13 2024 : Debug: (41) (TLS) has 0 bytes in the buffer Tue Jun 18 10:40:13 2024 : Debug: (41) (TLS) Saving 1988 bytes of RADIUS traffic for later (total 1988) Tue Jun 18 10:40:13 2024 : Debug: Thread 3 waiting to be assigned a request Tue Jun 18 10:40:13 2024 : Debug: Waking up in 0.3 seconds. Tue Jun 18 10:40:13 2024 : Debug: Listening on proxy (e.f.g.h, 60419) -> home_server (a.b.c.d, 2083) Tue Jun 18 10:40:13 2024 : Debug: Waking up in 0.3 seconds. Tue Jun 18 10:40:13 2024 : Debug: (41) Expecting proxy response no later than 29.670994 seconds from now Tue Jun 18 10:40:43 2024 : Debug: (41) No proxy response, giving up on request and marking it done Tue Jun 18 10:40:43 2024 : Proxy: Marking home server a.b.c.d port 2083 as zombie (it has not responded in 30.000000 seconds). Tue Jun 18 10:40:43 2024 : ERROR: (41) Failing proxied request for user "[email protected]", due to lack of any response from home server a.b.c.d port 2083 ...
On Jun 19, 2024, at 12:39 PM, martinsta @.***> wrote:
Thanks for having a look at this. I don't seem to get more info with patches applied but the server seems to stall some time after attempting a RadSec connection to a home_server. I'm providing more info from the previous non-patched run. This is an "SSL_connect() returned WANT_WRITE" with "(TLS) has 0 bytes in the buffer" situation. TLS handshake magic?
Yes. There appears to be data in SSL which we need to write out.
I've attached a patch from the current v3.2.x branch. It should replace the previous patch for src/main/tls_listen.c
I hope this works. Looking at the code and OpenSSL, it should be the correct fix,
On Wed, Jun 19, 2024 at 10:57:53AM -0700, Alan DeKok wrote: Yes. There appears to be data in SSL which we need to write out.
I've attached a patch from the current v3.2.x branch. It should replace the previous patch for src/main/tls_listen.c
I hope this works. Looking at the code and OpenSSL, it should be the correct fix,
Thanks, I'd be happy to test the new patch. I'm afraid the attachment's gone astray.
Thanks for another round. With the patch applied and BIO_read() getting called the server reports "Debug: (TLS) SSL_connect() writing 0 bytes to the network".
What's that? SSL_connect() left in handshake, SSL_get_error() returns WANT_WRITE and there is no data available to read from BIO? Does it need to retry? Or is this a different king of issue?
30 seconds later the original request timeouts, gets rejected, the server then logs "Debug: Proxy SSL socket has data to read" (possibly also in relation to another request) and locks up.
excerpt from a debug log follows:
Wed Jun 19 23:12:13 2024 : Debug: (TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (a.b.c.d, 2083) Wed Jun 19 23:12:13 2024 : Debug: Requiring Server certificate Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - before SSL initialization (0) Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Handshake state [PINIT] - Client before SSL initialization (0) Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello Wed Jun 19 23:12:13 2024 : Debug: Opened new proxy socket 'proxy (e.f.g.h, 36579) -> home_server (a.b.c.d, 2083)' Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: Trying to allocate ID (1/2) Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: request is now in proxy hash Wed Jun 19 23:12:13 2024 : Debug: (22) proxy: allocating destination a.b.c.d port 2083 - Id 54 Wed Jun 19 23:12:13 2024 : Debug: (22) Proxying request to home server a.b.c.d port 2083 (TLS) timeout 30.000000 Wed Jun 19 23:12:13 2024 : Debug: (22) Sent Access-Request Id 54 from e.f.g.h:36579 to a.b.c.d:2083 length 1988 ... Wed Jun 19 23:12:13 2024 : Debug: (0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello Wed Jun 19 23:12:13 2024 : Debug: (TLS) SSL_connect() returned WANT_WRITE Wed Jun 19 23:12:13 2024 : Debug: (TLS) SSL_connect() writing 0 bytes to the network Wed Jun 19 23:12:13 2024 : Debug: (22) (TLS) has 0 bytes in the buffer Wed Jun 19 23:12:13 2024 : Debug: (22) (TLS) Saving 1988 bytes of RADIUS traffic for later (total 1988) Wed Jun 19 23:12:13 2024 : Debug: Thread 2 waiting to be assigned a request Wed Jun 19 23:12:13 2024 : Debug: Listening on proxy (e.f.g.h, 36579) -> home_server (a.b.c.d, 2083) Wed Jun 19 23:12:13 2024 : Debug: (22) Expecting proxy response no later than 29.671495 seconds from now Wed Jun 19 23:12:43 2024 : Debug: (22) No proxy response, giving up on request and marking it done Wed Jun 19 23:12:43 2024 : Proxy: Marking home server a.b.c.d port 2083 as zombie (it has not responded in 30.000000 seconds). Wed Jun 19 23:12:43 2024 : ERROR: (22) Failing proxied request for user "[email protected]", due to lack of any response from home server a.b.c.d port 2083 ... Wed Jun 19 23:12:43 2024 : Debug: Proxy SSL socket has data to read
On Jun 20, 2024, at 7:07 AM, martinsta @.***> wrote:
Thanks for another round. With the patch applied and BIO_read() getting called the server reports "Debug: (TLS) SSL_connect() writing 0 bytes to the network".
That's annoying.
What's that? SSL_connect() left in handshake, SSL_get_error() returns WANT_WRITE and there is no data available to read from BIO? Does it need to retry? Or is this a different king of issue?
WANT_WRITE means that OpenSSL wants to write more data to the network.
But when we call BIO_read() to get data from OpenSSL, it returns "no more data".
So I don't know what's going on.
30 seconds later the original request timeouts, gets rejected, the server then logs "Debug: Proxy SSL socket has data to read" (possibly also in relation to another request) and locks up.
That's unfortunate.
You can edit src/main/tls_listen.c, in the proxy_tls_recv() function. Maybe have it print out how much data was read from the network.
And also instrument proxy_tls_read() to see what's going on there, and why it is likely returning "no data".
The main issue is that I can't reproduce the issue here, so it's very hard to debug it.