ovpn-dco icon indicating copy to clipboard operation
ovpn-dco copied to clipboard

Potential Performance Bottlenecks in ovpn-dco

Open alloc33 opened this issue 1 year ago • 4 comments

Hi OpenVPN DCO team,

I’ve been analyzing DCO performance and came across potential bottlenecks related to the two main workqueues in the implementation: 1. Crypto Workqueue (TX/RX Encryption/Decryption) 2. Event Workqueue

From my understanding, these workqueues handle most of the kernel-side processing, including packet encryption/decryption and event processing. However, I noticed that per-peer workqueues were not implemented—which raises a few concerns regarding performance under load.

Questions & Concerns: 1. Why was a single shared workqueue model chosen instead of per-peer workqueues? • Was it due to complexity, resource efficiency, or another reason? • Would per-peer workqueues reduce contention, especially in high-throughput scenarios? 2. Could the current shared workqueue model be contributing to performance bottlenecks? • In scenarios with high numbers of concurrent peers, could event processing or crypto operations stall due to queue saturation? • Are there any known cases where workqueue delays affect throughput or latency? 3. Would introducing per-peer workqueues be a viable optimization? • If not, what would be the primary challenges in implementing them? • Are there alternative strategies to mitigate potential contention in the current model?

Image

alloc33 avatar Feb 12 '25 06:02 alloc33

Hi @alloc33 and thank you very much for your report! First of all I am impressed by the amount of research you have done. OTOH I must inform you that this repository is in maintenance mode and it won't see any new feature development.

Our effort has since a while moved to https://github.com/OpenVPN/ovpn-net-next where a new generation DCO (now called 'ovpn') is being developed and is being pushed upstream.

(You can see the latest patchset sent to the netdev mailing list for review at https://patchwork.kernel.org/project/netdevbpf/list/?series=932484)

There are no per-peer workqueues yet (to keep the first prototype simple), as that is something we've planned to work on after the merge, but nothing prevents us from starting to work on it already. This said, we need to first evaluate if that can truly improve performance or not (gut feeling is that it will help)

However, there are no ptr-rings anymore and the only existing queue is the internal crypto one which may be used when the crypto engine decides to go async and the NAPI queue, but that speaks for itself.

ordex avatar Feb 12 '25 08:02 ordex

@ordex I would be happy to help with per-peer workqueues. Not sure if it would make sense to use no-std Rust for it, but I'll give it a chance:) Will do some research first

alloc33 avatar Feb 13 '25 10:02 alloc33

no-std Rust? In any case, please be aware that in ovpn there are no queues for incoming or outgoing packets. So be mindful about where you think we should add them ☺️ You can open a ticket in that repo and get the discussion started there. Thanks!

ordex avatar Feb 13 '25 10:02 ordex

@ordex can confirm the per‑peer concurrency approach was beneficial. Using Rust’s no‑std concurrency primitives (via rust‑for‑linux), we re‑architected the pipeline on a per‑peer basis. In lower concurrency scenarios (3 clients), we measured a 9% throughput gain, and we also scaled more efficiently under heavier loads (50 clients).

alloc33 avatar Mar 08 '25 06:03 alloc33

care to send this change to the mailing list? openvpn-devel+netdev would be appreciated for sure. I am closing this ticket in any case as this is not something that we will be able to merge to ovpn-dco.

Feel free to re-open the discussion on ovpn-net-next if needed.

ordex avatar Oct 17 '25 12:10 ordex