Title: Memory leak after ~48h when using duplicate-cn with TLS renegotiation (reneg-sec) enabled
Describe the bug The OpenVPN server process memory usage increases continuously (trend consistent with exponential growth) over approximately 48 hours while the number of connected clients remains stable (~80 clients). The server process becomes unresponsive (management interface stops responding) and must be restarted. No explicit errors appear in the logs prior to loss of responsiveness.
To Reproduce
-
Build and run OpenVPN 2.6.14 on an Alpine Linux 3.21 virtual machine (musl libc), compiled from the official release tarball with compilation options similar to:
./configure \ --enable-lzo \ --enable-lz4 \ --enable-plugins \ --enable-port-share \ --enable-iproute2Runtime library versions (per environment banners): OpenSSL 3.3.5, LZO 2.10, LZ4 1.9.x.
-
Server configuration includes (abridged):
port 1194 proto udp dev tun server 172.27.224.0 255.255.240.0 topology subnet duplicate-cn ifconfig-pool-persist /var/lib/openvpn/ipp.txt keepalive 10 120 reneg-sec 3600 mssfix 1420 client-to-client data-ciphers AES-256-GCM:AES-256-CBC:CHACHA20-POLY1305:AES-128-GCM data-ciphers-fallback AES-256-GCM auth SHA256 tls-version-min 1.2 tls-version-max 1.3 tls-crypt /etc/openvpn/pki/tls-crypt.key sndbuf 0 rcvbuf 0 fast-io user openvpn group openvpn persist-key persist-tun verb 3 management 0.0.0.0 7505(Startup warning observed:
--ifconfig-pool-persist will not work with --duplicate-cn.) -
Maintain approximately 80 clients connected continuously using the same client certificate (so
duplicate-cnis in effect). -
Monitor the server process memory over time using standard host tools:
ps -o pid,comm,rss | grep openvpn # BusyBox ps; RSS in KBor equivalent host-level monitoring.
-
After ~24–48 hours, observe continuous memory growth (trend consistent with exponential increase) while client count remains steady. The server process becomes unresponsive and requires a restart.
Expected behavior Server memory usage should remain stable over long uptimes with a steady number of connected clients. Normal client activity (connect/disconnect, TLS handshakes) should not cause unbounded memory growth or loss of responsiveness.
Version information (please complete the following information):
- OS: Alpine Linux 3.21 (musl) running on a virtual machine
- Architecture: aarch64
- OpenVPN version: 2.6.14 (built from official release tarball)
- OpenSSL: 3.3.5
- Peer version (if relevant): OpenVPN 2.6.14
- Transport: UDP
Additional context
- The number of connected clients is stable (~80) during the observation window; the increase in memory is not due to added clients.
- Logs show regular client activity and repeated
MULTI: Learn:entries for existing sessions; no explicit errors precede the loss of responsiveness. - The management interface (bound to 0.0.0.0:7505) becomes non-responsive before the restart.
- The issue reproduces consistently under the conditions described.
This is had to diagnose without additional details. We have similar setups that are running days and weeks without becoming unresponsive.
Also you seem to have two issues that might be related or not. The memory leak and the management interface becoming unresponsive. Also what kind of memory leak/growth are we talking about here?
What exactly are the sympton of management interface not resposive anymore. Do you don't get any notifcations from OpenVPN anymore or is the management interface not responding to your commands anymore?
Providing some of the log might help us to at least have some guesses.
repeated MULTI: Learn: entries for existing sessions
Could you give some more details what the repeated log entries are? Having them repeat for ongoing session is rather unexpected.
data-ciphers-fallback AES-256-GCM
This is nonsense. You tell OpenVPN to fall back to a cipher that clients that do not support cipher negoation do not support.
My gut feeling is that something in duplicate-cn is not being cleaned up properly on reconnecting with the same cert over and over.
One of these edge cases that should have never been implemented in the first place...
It might also be something like a lot of connects/disconnect from the management interface that cuase the issue. We need more information to understand what is going on.
Update – consolidated data (IPs redacted, no configuration changes)
Keeping the same process running to preserve state; no restart or config change has been made. The next crash normally happens after ~48h of uptime. If it does, I’ll capture the same datasets right before and after the freeze to document how the management interface behaves.
🖥️ Host / build context
Linux ... 6.12.46 #1 SMP Fri Oct 3 18:06:14 UTC 2025 aarch64 Linux
OpenVPN 2.6.14 aarch64-unknown-linux-musl [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD]
library versions: OpenSSL 3.3.5 30 Sep 2025, LZO 2.10
Compile-time options include (abridged): enable_lz4=yes, enable_lzo=yes, enable_management=yes, enable_iproute2=yes
Alpine Linux 3.21.5 (musl 1.2.5)
📈 Process memory and runtime (~19h uptime, same process)
ps -o pid,comm,rss,vsz,etime | grep openvpn
1 openvpn 30m 32m 19h33
/proc/1/status (abridged)
VmSize: 33688 kB
VmRSS: 31408 kB
VmData: 26516 kB
Threads: 1
/proc/1/limits (abridged)
Max resident set unlimited
Max processes unlimited
Max open files 65536 / 1048576
🧵 Threads / LWPs snapshot
/proc/1/status → Threads: 1
ps -T | wc -l → 5973
(Shows a single thread reported in /proc, but nearly 6000 LWPs visible via ps -T.)
⚙️ Management interface (IPs and real addresses redacted)
status
Updated: 2025-11-05 14:07:44
GLOBAL STATS
Max bcast/mcast queue length,98
END
load-stats
SUCCESS: nclients=93, bytesin=484940838, bytesout=244827065
- Client count stable (~93).
- Routing table and connection times unchanged since the previous day.
- When the issue occurs (around 48h mark), the management socket typically stops replying to
statusorload-statscommands — I will confirm this on the next occurrence.
🔁 Recent TLS activity (redacted for IPs)
During this period, many peers show periodic TLS: soft reset events followed by VERIFY OK and full revalidation:
2025-11-05 14:01:54 [CN=grpXXXX] TLS: soft reset sec=3262/3262
2025-11-05 14:02:24 [CN=grpXXXX] TLS: soft reset sec=3510/3510
2025-11-05 14:02:29 [CN=grpXXXX] TLS: soft reset sec=3344/3343
2025-11-05 14:02:37 [CN=grpXXXX] TLS: soft reset sec=3320/3319
2025-11-05 14:02:40 [CN=grpXXXX] TLS: soft reset sec=3511/3510
...
Each reset is followed by VERIFY OK and
Control Channel: TLSv1.3, cipher TLS_AES_256_GCM_SHA384.
🔁 Repeated MULTI: Learn entries (sanitized examples)
Several peers show recurring IP learn messages for sessions that stay connected:
2025-11-05 13:36:xx [CN=grpXXXX] MULTI: primary virtual IP ...: 172.27.224.xx [real addr redacted]
2025-11-05 14:07:xx [CN=grpXXXX] MULTI: primary virtual IP ...: 172.27.224.xx [real addr redacted]
These repeat for the same CN and internal IP even though no explicit reconnect is logged.
Summary
- Memory and thread count continue to grow while the number of connected clients remains constant (~93).
- Management interface currently responds normally; typically becomes unresponsive after ~48h.
- Multiple periodic TLS soft resets and repeated
MULTI: Learnentries for existing sessions observed. - I’ll provide updated metrics and management output at the next unresponsive event.
thread count continue to grow
This is very peculiar. OpenVPN itself is single threaded. And with the lwp thread count, you mean something like ps -auxT |grep openvpn right? I wonder what the 6k threads you have on your system are.
2025-11-05 13:36:xx [CN=grpXXXX] MULTI: primary virtual IP ...: 172.27.224.xx [real addr redacted]
There is only one occurance of that string in OpenVPN ("MULTI: primary virtual IP for %s: %s",) and that is in multi_client_connect_late_setup and that gets executed only once per client connect.
So this should not be repeated one than once per client.
Could you show the full server and logs?
Are you using any plugins in OpenVPN?
Thanks @schwabe for the clarification regarding data-ciphers-fallback. You're absolutely right — I had originally added it a while ago when testing a few older client devices, but all current clients now support GCM.
I’ve removed the fallback directive from the configuration and will redeploy it after the next crash occurs (to keep the current process running until then). Appreciate the recommendation!
thread count continue to grow
This is very peculiar. OpenVPN itself is single threaded. And with the lwp thread count, you mean something like
ps -auxT |grep openvpnright? I wonder what the 6k threads you have on your system are.2025-11-05 13:36:xx [CN=grpXXXX] MULTI: primary virtual IP ...: 172.27.224.xx [real addr redacted]
There is only one occurance of that string in OpenVPN ("MULTI: primary virtual IP for %s: %s",) and that is in
multi_client_connect_late_setupand that gets executed only once per client connect.So this should not be repeated one than once per client.
Could you show the full server and logs?
Are you using any plugins in OpenVPN?
Thanks, @schwabe — here is a complete, sanitized follow-up with the concrete data you asked for. No config changes were made; I’m keeping the current process running to preserve state.
1) Thread count clarification (per-process vs system-wide)
My earlier “~6k threads” came from a system-wide BusyBox ps -T (my bad).
Per-process LWP/thread counts for the OpenVPN daemon are:
/proc/1/status → Threads: 1
ls -1 /proc/1/task | wc -l → 1
So OpenVPN itself is single-threaded here, as expected.
2) Server configuration (abridged, sanitized)
No plugins and no hook scripts are used.
port 1194
proto udp
dev tun
ca /etc/openvpn/pki/ca.crt
cert /etc/openvpn/pki/issued/server.crt
key /etc/openvpn/pki/private/server.key
dh none
server 172.27.224.0 255.255.240.0
ifconfig-pool-persist /var/lib/openvpn/ipp.txt
topology subnet
push "topology subnet"
mssfix 1420
push "dhcp-option DNS 172.20.0.10"
push "route 172.20.0.0 255.255.0.0"
push "route 10.10.0.0 255.255.0.0"
client-to-client
duplicate-cn
keepalive 10 120
allow-compression no
tls-crypt /etc/openvpn/pki/tls-crypt.key
data-ciphers AES-256-GCM:AES-256-CBC:CHACHA20-POLY1305:AES-128-GCM
# data-ciphers-fallback was present earlier for legacy tests; removed and will redeploy later
auth SHA256
tls-version-min 1.2
tls-version-max 1.3
tls-groups secp384r1:secp521r1:secp256r1:x25519
reneg-sec 3600
tls-cert-profile preferred
tls-cipher TLS-ECDHE-ECDSA-WITH-AES-256-GCM-SHA384:TLS-ECDHE-RSA-WITH-AES-256-GCM-SHA384:TLS-ECDHE-ECDSA-WITH-CHACHA20-POLY1305-SHA256:TLS-ECDHE-RSA-WITH-CHACHA20-POLY1305-SHA256
remote-cert-tls client
sndbuf 0
rcvbuf 0
fast-io
user openvpn
group openvpn
persist-key
persist-tun
management 0.0.0.0 7505
verb 3
explicit-exit-notify 1
Plugins/scripts: none (no --plugin, no client-connect, client-disconnect, learn-address, etc.).
3) Build / OS context
Linux … 6.12.46 aarch64
OpenVPN 2.6.14 aarch64-unknown-linux-musl
library versions: OpenSSL 3.3.5, LZO 2.10
Alpine Linux 3.21.5 (musl 1.2.5)
4) Management snapshots (status 3 + load-stats, sanitized)
(Real/remote addresses redacted; counts/fields kept.)
TITLE OpenVPN 2.6.14 aarch64-unknown-linux-musl
TIME 2025-11-05 14:51:59
# CLIENT_LIST excerpt (CN, Virtual Address, Bytes, Connected Since, Cipher)
CLIENT_LIST grp2601w-prod 172.27.224.xx … 2025-11-04/05 … AES-128-GCM
… (93 clients total; all show Data Channel Cipher = AES-128-GCM)
# ROUTING_TABLE excerpt (virtual + CN only)
ROUTING_TABLE 172.27.224.xx grp2601w-prod [redacted] 2025-11-05 14:51:xx
… (entries present for all connected clients)
GLOBAL_STATS
Max bcast/mcast queue length 98
dco_enabled 0
END
load-stats
SUCCESS: nclients=93, bytesin=499927261, bytesout=255713146
5) Recent logs (sanitized)
TLS soft resets: I see periodic TLS: soft reset events for multiple peers, each followed by full VERIFY OK and TLS 1.3 control channel lines:
2025-11-05 14:43:37 [CN=grpXXXX] TLS: soft reset sec=3309/3309
2025-11-05 14:43:37 [CN=grpXXXX] VERIFY OK … ; Control Channel: TLSv1.3 TLS_AES_256_GCM_SHA384
2025-11-05 14:43:47 [CN=grpXXXX] TLS: soft reset sec=3424/3424
2025-11-05 14:43:55 [CN=grpXXXX] TLS: soft reset sec=3304/3304
2025-11-05 14:44:55 [CN=grpXXXX] TLS: soft reset sec=3307/3307
2025-11-05 14:45:09 [CN=grpXXXX] TLS: soft reset sec=3465/3465
… (similar pattern within the same window)
MULTI around a reconnect: A clear reconnect shows the expected sequence once per connect:
2025-11-05 14:48:27 [CN=grpXXXX] tls_multi_process: initial untrusted session promoted to trusted
2025-11-05 14:48:27 [CN=grpXXXX] MULTI_sva: pool returned IPv4=172.27.224.xx
2025-11-05 14:48:27 [CN=grpXXXX] MULTI: Learn: 172.27.224.xx -> [redacted]
2025-11-05 14:48:27 [CN=grpXXXX] MULTI: primary virtual IP for [CN]: 172.27.224.xx
Regarding your note that
"MULTI: primary virtual IP …"is only logged once per client connect: understood. In this recent sample, I only see it once for that reconnect. I will keep watching to see if the same CN ever logs anotherprimary virtual IPline without an intervening reconnect indication and will post that if it occurs.
Additional factual observations (logs + metrics + periodicity)
-
Precise timestamp (UTC / local) At 2025-11-10 03:06:25 UTC (local Nov 09 22:06:26), a burst of consecutive events occurred:
2025-11-10 03:06:25 SENT CONTROL [grp2601w-prod]: 'RESTART' (status=1) 2025-11-10 03:06:25 SENT CONTROL [grp2601w-prod]: 'RESTART' (status=1) 2025-11-10 03:06:25 SENT CONTROL [grp2601w-prod]: 'RESTART' (status=1) 2025-11-10 03:06:25 SENT CONTROL [grp2601w-prod]: 'RESTART' (status=1) 2025-11-10 03:06:25 event_wait : Interrupted system call (fd=-1,code=4)followed immediately by multiple UDP connection refusals:
2025-11-10 03:06:21 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) 2025-11-10 03:06:11 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) 2025-11-10 03:06:01 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) 2025-11-10 03:05:51 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) 2025-11-10 03:05:41 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) 2025-11-10 03:05:30 read UDPv4 [ECONNREFUSED]: Connection refused (fd=6,code=111) -
Metrics behavior (2-day window)
- Memory usage starts at around 27 MiB and increases gradually up to 168 MiB before resetting every 24 hours.
- CPU usage also increases, following an exponential curve, with peaks becoming more frequent right before each restart.
- Other metrics (storage, network) remain stable and show no anomalies at the time of reset.
-
Periodicity This pattern now occurs every 24 hours with the current load (~80 devices connected). Earlier in the summer, when fewer devices were connected, the interval was approximately 48 hours.
I’ve also shared the detailed metrics and graphs privately (CPU/memory curves and raw data) in case they help with further analysis.
The "TLS: soft reset" messages happen because you ordered them - reneg-sec 3600 does exactly this: start a TLS renegotiation every 3600 seconds (+/- a bit of randomness).
The "burst of identical events" (RESTART) is basically explict-exit-notify 1 triggering, telling the clients that the server is going to exit now - once per client. Why this happens is in the few lines above that you did not include. So with these incomplete logs, we can't really do much of an analysis.