DCO peer inflation: Status files show 70-150% more entries than non-DCO despite equal load
Environment:
- OpenVPN 2.6.14 with DCO enabled
- Linux kernel 6.14.0 with ovpn-dco kernel module
- Production setup: 50+ OpenVPN server processes per machine
- dnsproxy forwarding traffic using random 127.0.x.x source IPs (censorship circumvention)
Problem: Server with DCO consistently shows 70-150% more peer entries in status files compared to identical non-DCO server with equal load distribution:
- DCO server: 1,282 unique clients (after deduplication)
- Non-DCO server: 741 unique clients
- Inflation: 541 extra entries (73%)
Root Cause Analysis: When clients disconnect from process A and reconnect to process B, the old entry in process A is not removed:
- DCO: Old entries persist indefinitely in status files → accumulation over time
- Non-DCO: Old entries are cleaned up properly → 1:1 ratio maintained
Evidence: Example client appearing in 3 different OpenVPN processes: server-72581: connected 05:38:25 (still in status file after 2+ hours) server-91967: connected 05:56:00 (still in status file after 1.5+ hours) server-91970: connected 06:30:16 (current active connection)
All 3 entries remain in respective status files. With non-DCO, only the most recent connection appears.
Keepalive Configuration: keepalive 25 180 Server-side timeout should be 360 seconds (180 × 2), but old entries never expire.
Log Analysis (2-hour window):
- New connections created: 634
- DEL_PEER notifications received: 283
- Gap: 351 peers never sent expiry notification
This suggests DCO kernel module is not triggering keepalive expiry for all disconnected peers.
What We've Tried:
- Periodic cleanup of orphaned instances - Failed: Either found nothing to clean or removed active connections
- Duplicate detection at instance creation - Failed: Common name not available until after TLS handshake completes
- Duplicate detection after TLS handshake - Partial success: Prevents within-process duplicates, but doesn't fix cross-process inflation (the main problem)
Current Status:
- Within-process duplicates: Fixed (0 duplicates found in same process)
- Cross-process stale entries: Not fixed (70% inflation persists)
Question for OpenVPN Team: Why would DCO fail to send DEL_PEER notifications for ~50% of disconnected peers, causing stale entries to persist indefinitely in userspace status files? Is this a known limitation with DCO keepalive mechanism, or is there a configuration/implementation issue we're missing?
Any guidance on how to ensure proper cleanup of stale DCO peer entries would be greatly appreciated.
This is a known issue (https://github.com/OpenVPN/openvpn/issues/900).
Supposedly it is now fixed with 2.7_rc3 and the new DCO kernel module.
The way to fix this can be backported to 2.6 and the old kernel module, but nobody has done the full work yet (lack of time, not lack of interest). Parts of it involve backporting infrastructure (https://github.com/OpenVPN/openvpn/issues/883) and then backporting commit 7791f5358a5574d4ef1bd27e2d52300c9d98bd72 from master to release/2.6.
So if you have time to work on the code, you could see if the commits referenced help you get to a working state (and it would help us move this onwards).