kamailio Memory usage increases everytime tls.reload is executed

Description

We are using Kamailio 5.7.4 on Debian 12 (from http://deb.kamailio.org/kamailio57) with rtpengine as an Edgeproxy for our clients. The instance terminates SIP/TLS (with Cliencertificates) and forwards the SIP Traffic to internal systems.

After some days we are getting errors like this tls_complete_init(): tls: ssl bug #1491 workaround: not enough memory for safe operation: shm=7318616 threshold1=8912896

First we thought Kamailio just doesnt have enough memory, so we doubled it..

But after some days the Logmessage (and Userissues) occured again.

So we monitored the shmmem statistics and found that used and max_used are constantly growing til it reaches the limit.

As i mentioned we are using client-certificates and so we are also using the CRL feature. We do have a systemd-timer which fetches the CRL every hour and runs 'kamcmd tls.reload' when finished.

Our tls.cfg looks like this:

[server:default]
method = TLSv1.2+
private_key = /etc/letsencrypt/live/hostname.de/privkey.pem
certificate = /etc/letsencrypt/live/hostname.de/fullchain.pem
ca_list = /etc/kamailio/ca_list.pem
ca_path = /etc/kamailio/ca_list.pem
crl = /etc/kamailio/combined.crl.pem
verify_certificate = yes
require_certificate = yes

[client:default]
verify_certificate = yes
require_certificate = yes

After testing a bit we found that every time tls.reload is executed Kamailio consumes a bit more memory which eventually leads to all the memory being consumed which leads to issues for our users.

See following example:

[0][root@edgar-dev:~]# while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd core.shmmem ; sleep 1 ; done
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 223001520
	used: 41352552
	real_used: 45433936
	max_used: 45445968
	fragments: 73
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 222377960
	used: 41975592
	real_used: 46057496
	max_used: 46069232
	fragments: 78
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 221748664
	used: 42604992
	real_used: 46686792
	max_used: 46698080
	fragments: 77
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 221110832
	used: 43242408
	real_used: 47324624
	max_used: 47335608
	fragments: 81
}
^C
[130][root@edgar-dev:~]#

Troubleshooting

Reproduction

Everytime tls.reload is called the memory consumptions grows..

Debugging Data

If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data!

Log Messages

If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data!

SIP Traffic

SIP doesnt seem to be relevant here

Possible Solutions

Calling tls.reload less often or restart kamailio before memory is consumed ;)

Additional Information

version: kamailio 5.7.4 (x86_64/linux) 
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown 
compiled with gcc 12.2.0

Operating System:

* Debian GNU/Linux 12 (bookworm)
* Linux edgar-dev 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

Apr 24 '24 08:04 denzs

I just realized that i forgot to mention.. in addition to the logged error message our clients start to get connection issues as well, so we have to restart Kamailio asap in that case..

May 07 '24 06:05 denzs

@denzs do you have a monitoring tool? Prometheus + Graphana graphs?

May 07 '24 07:05 sergey-safarov

Probably this part has to be reviewed ... first the tls reload was initially designed to be done rather rarely, when the certificates expires. The CRL feature was also not much in use, at least in what I could experience so far, most of the deployments are with server-side only certificates.

Furthermore, I am not sure if old certificates can be cleared right away after the restart, existing connections are not closed and there might be some references to their certificates.

Are you doing the reload only if there are changes in the content of the crl or certificate files? Or the reload is done anyhow?

May 07 '24 07:05 miconda

@sergey-safarov yes we do :)

@miconda at the moment we do the tls.reload unconditionally and quite 'high frequently' to ensure the CRLs are up to date.. of course we can check if the CRL changed, but from my point of view that would only delay the neccesary restart of kamailio..

May 07 '24 08:05 denzs

This Screenshot is from our dev environment (with no tls-clients connected) running:

while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done

Parallel watching core.shmmem outpot looks like:

Ok. TLS configuration reloaded.
{
	total: 268435456
	free: 1894256
	used: 262444424
	real_used: 266541200
	max_used: 266550968
	fragments: 85
}
error: 500 - Error while fixing TLS configuration (consult server log)
{
	total: 268435456
	free: 1208784
	used: 263491296
	real_used: 267226672
	max_used: 268435208
	fragments: 11749
}
Ok. TLS configuration reloaded.
{
	total: 268435456
	free: -9223372036854776
	used: 267589696
	real_used: 271686888
	max_used: 271696928
	fragments: 87
}

May 07 '24 08:05 denzs

Could you compare it with a graph for our server for last 60 days and about 25 WebRTC clients?

and

Here used Kamailio 5.7.2 with Letencrypt server. Cert reloads once per two-mouth. We dot use CRL. To avoid too often cert reloads we compare currently used certificates and the last cert using commands like.

    rsync -l --recursive --info=name --dry-run ${LECRTSDIR} ${LETARGETDIR} >${CHKUPDLOG}
    # Synchronizing certificates.
    if [ ! -s ${CHKUPDLOG} ]; then
        echo "Check updates. No changes required"
        rm -f ${CHKUPDLOG}
    else
        echo "Has new certificates. Start sync"
        rsync -azlcv --recursive --delete --info=name ${LECRTSDIR} ${LETARGETDIR} >"${SYNCLOG}"
        rm -f ${CHKUPDLOG}
    fi

May 07 '24 19:05 sergey-safarov

The problem actually occured after we added the CRL some weeks ago.. without CRL there was no such behaviour. And of course there are a lot options to mitigate the issue respectively decrease the propability by doing less reloads by decreasing the cycle and/or check if there was a change at the CRL at all..

Anyhow i thought raising an issue makes sense, because from my point of view there is definitively some memory leaking when using tls.reload in combination with a CRL..

May 08 '24 05:05 denzs

If it happens only with adding a CRL, it looks indeed like an issue in this code path. In the end using CRL is probably quite rare.

May 08 '24 07:05 henningw

After some time debuging, I could replicate this issue of memory increase when using a CRL and tls.reload.

One possible issue according to memory statistics printed frequently while we have while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done running is:

INFO: qm_sums: qm_sums():  count=  5288 size=    183440 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17378 size=   1275712 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5341 size=    242768 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17325 size=   1381936 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5331 size=    248544 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17335 size=   1422112 bytes from tls: tls_init.c: ser_malloc(364)
---
INFO: qm_sums: qm_sums():  count=  5360 size=    290560 bytes from tls: tls_init.c: ser_realloc(372)
INFO: qm_sums: qm_sums():  count= 17306 size=   1466000 bytes from tls: tls_init.c: ser_malloc(364)

Memory here increases until we exhaust the shared memory max allocation and then tls.reload fails.

Some notes: When using tls.reload without a CRL, I didn't see any notable increase in memory usage. The above-noted allocations are steady around

count=  9415 size=    948432 bytes from tls: tls_init.c: ser_malloc(364)
count=  1011 size=    151408 bytes from tls: tls_init.c: ser_realloc(372)

May 09 '24 09:05 xkaraman

This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.

Jun 21 '24 02:06 github-actions[bot]

Although it is quite easy to monitor and workaround this issue - i still think it is a valid bug :)

Jun 21 '24 06:06 denzs

Just for reference, this was discussed on the developer list, thread: https://lists.kamailio.org/mailman3/hyperkitty/list/[email protected]/message/AJMGLWJNQGA6C7SKLVQEXI5RFRRRWBN2/

Jun 21 '24 06:06 henningw

This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.

Aug 03 '24 02:08 github-actions[bot]

Are there any news/intentions on merging the branch from xkaraman? :)

Aug 05 '24 07:08 denzs

Hey @denzs,

it's been some time i have checked this sorry.

There was a discussion about introducing a parameter for this change. I will try to implement it asap, so i can create a PR for this and reinitiate the discussion!

Thanks for your patience, Xenofon

Aug 07 '24 15:08 xkaraman

@xkaraman thank you so much! I did not want to rush you, i just wanted to prevent this issue from being auto-closed :)

Aug 08 '24 06:08 denzs

Hey @denzs,

I have just create https://github.com/kamailio/kamailio/pull/3972 for this.

Can you maybe check whether kamailio still functions as intended (other than the tls.reload) with the new shared context stuff?

After applying the patch, set the new tls parameter enable_shared_ctx to 1 to the config file and you are good to go.

Any feedback is welcome!

Sep 09 '24 15:09 xkaraman

@xkaraman thank you so much for taking care of this! :)

I tested your branch on our dev instance, the normal functions are doing fine so far :+1:

But switchting enable_shared_ctx from 0 to 1 only seems to delay the memory leaking issue:

The first 5 minutes are with enable_shared_ctx=0 and the rest with enable_shared_ctx=1. During the last 5 minutes i stopped the tls.reload to see if memory consumption would descrease again, but that is not the case..

Tested with: while true ; do /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done

Sep 12 '24 07:09 denzs

Hey @denzs,

Thanks for testing this out.

As we discussed in the mailing list and also as noted in the PR, indeed this patch is not adequate to fix the actual problem. I was trying to lower the memory usage and hoped that the increase would not really be noticable any more (clearly not the case from your report).

The problem seems to be in the SSL_CTX_load_verify_locations and the usage of it in the load_crl(). I will keep digging and see if there is something to be done to actually free the memory.

Just for refernece what OpenSSL are you testing this with?

Sep 12 '24 08:09 xkaraman

@xkaraman thanks for your feedback :) It is a Debian 12 system with:

ii  libssl-dev:amd64                  3.0.14-1~deb12u2                    amd64        Secure Sockets Layer toolkit - development files
ii  libssl3:amd64                     3.0.14-1~deb12u2                    amd64        Secure Sockets Layer toolkit - shared libraries
ii  openssl                           3.0.14-1~deb12u2                    amd64        Secure Sockets Layer toolkit - cryptographic utility

Sep 12 '24 09:09 denzs

This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.

Oct 25 '24 03:10 github-actions[bot]

Just a 'ping' to prevent the bot from closing the issue.. :)

Oct 25 '24 06:10 denzs

kamailio kamailio copied to clipboard

Memory usage increases everytime tls.reload is executed

Description

Troubleshooting

Reproduction

Debugging Data

Log Messages

SIP Traffic

Possible Solutions

Additional Information

kamailio
kamailio copied to clipboard