src icon indicating copy to clipboard operation
src copied to clipboard

VIRTIO Slow performance compared to Linux/FreeBSD

Open Thalhammer opened this issue 4 months ago • 20 comments

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [X] I have read the contributing guide lines at https://github.com/opnsense/src/blob/master/CONTRIBUTING.md
  • [X] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/src/issues?q=is%3Aissue

Describe the bug Virtio interfaces perform vastly slower than they should for no obvious reasons.

To Reproduce

Steps to reproduce the behavior:

  1. Install OPNsense on a kvm hypervisor like Proxmox
  2. Add a virtio interface
  3. Install iperf3 on vm and host
  4. Measure throughput to the virtio interface

Expected behavior OPNsense is capable of performing similar to Linux or stock FreeBSD. For my system this is around 25Gbit/s (Linux) or 22Gbit/s (FreeBSD) without any changes/tweaks inside the respective vms and a default mtu of 1500.

Describe alternatives you considered Right now my best bet is switching to a Linux based router like openwrt or vyos, however I am used to OPNsense and PFsense so I'd prefer to avoid that.

Screenshots

Relevant log files

Additional context The issue is present on both OPNsense and PFsense. While performing the speedtest neither the host nor VM have excessive CPU usage, not even on a single core. I have been told that increasing the mtu might improve the performance, however thats not an option for me because I have devices in my network that don't support that. In addition both Linux as well as stock FreeBSD are capable of performing well even without increased mtu. The issue is not new to Proxmox 9 or OPNsense 25, it has existed since I use opnsense/pfsense (at least 5 years now), but soonish my internet is going to become fast enough for it to be a bottleneck.

Environment

OPNsense 25.7.1 (amd64) Intel® Xeon™ CPU E5-2697 v3 (2 Sockets with 14C/28T each, the VM is allocated 4 of them) Network Virtio Hypervisor Proxmox 9.0.4 (Kernel 6.14.8-2-pve, qemu 10.0.2)

Thalhammer avatar Aug 15 '25 20:08 Thalhammer

Hello @Thalhammer

The ones below would be some starting ideas if I had the same problem, maybe you have done them already.

Firstly, what is the speed you get with opnsense?

Check the basics are on like multiqueue and vhost.

Disable everything and re-enable them one by one to potentially find the culprit, pf, suricata, traffic shaping...

Also, increase the cluster counts/ mbuf.

sopex avatar Aug 19 '25 09:08 sopex

Firstly, what is the speed you get with opnsense?

It thought I had mentioned that, must have removed it on an edit. I get ~2.2Gbit/s with opnsense.

Check the basics are on like multiqueue and vhost.

I did enable mq on the host, however opnsense/pfsense can't use virtio multiqueue because its incompatible with altq. However this is unlikely the issue because stock freebsd with mq disabled performs about the same as with mq enabled.

Disable everything and re-enable them one by one to potentially find the culprit, pf, suricata, traffic shaping...

The opnsense VM I used for testing was a brand new install. No extra firewall rules, no plugins, no traffic shaping enabled. I did try with various combinations of hardware csum/lso/tro settings but it either changed nothing or made it worse.

Also, increase the cluster counts/ mbuf.

How do I do that ?

Thalhammer avatar Aug 19 '25 12:08 Thalhammer

Firstly, what is the speed you get with opnsense?

It thought I had mentioned that, must have removed it on an edit. I get ~2.2Gbit/s with opnsense.

Check the basics are on like multiqueue and vhost.

I did enable mq on the host, however opnsense/pfsense can't use virtio multiqueue because its incompatible with altq. However this is unlikely the issue because stock freebsd with mq disabled performs about the same as with mq enabled.

Disable everything and re-enable them one by one to potentially find the culprit, pf, suricata, traffic shaping...

The opnsense VM I used for testing was a brand new install. No extra firewall rules, no plugins, no traffic shaping enabled. I did try with various combinations of hardware csum/lso/tro settings but it either changed nothing or made it worse.

Also, increase the cluster counts/ mbuf.

How do I do that ?

It's worth it to kill the pf and turn it into a router for the purposes of the test.

https://forum.opnsense.org/index.php?topic=39450.0

mbuf:

kern.ipc.nmbclusters=

sopex avatar Aug 19 '25 12:08 sopex

I ran the tests again with the firewall disabled and I do get a higher performance of around 4.2Gbit/s. However this is still nowhere close to the values seen on linux. I have configured Multiqueue to 4 (and judging by dmesg it does in fact use them, unlike pfsense).

A side note is that since this router is supposed to serve as the main router/firewall of my home running it with packet filtering disabled is not really an option for more than testing.

I then increased the mentioned nmbclusters value from its initial 251318 to 1005272, but that did not give any measurable impact. This gives me 2.85Gbit/s with the firewall enabled and 2.95Gbit/s with the firewall disabled. Interestingly this is lower than the default value with firewall disabled.

Thalhammer avatar Aug 20 '25 08:08 Thalhammer

A number of patches went into the stable branch now improving vtnet(4) but I am not sure where this leads us. Watch out for kernel changes in 25.7.6 or 25.7.7.

fichtner avatar Oct 10 '25 11:10 fichtner

@fichtner I think something is still missing — Timo Völker has worked on this issue: https://reviews.freebsd.org/p/timo.voelker_fh-muenster.de/

These two commits should also be included:

https://reviews.freebsd.org/rGbcb298fa9e23c1192c5707086a67d3b396186abc — improves deferred checksum computation in TCP/UDP/SCTP and ensures proper handling for forwarded packets when checksum offloading is used.

https://reviews.freebsd.org/rG187ee62c71f2be62870f26ae98de865e330121be — fixes incorrect UDP checksum handling in dhclient, especially in virtualized environments with offloading enabled.

Jimbambuli avatar Oct 10 '25 19:10 Jimbambuli

Last time I checked these were not cherry-picked to stable/14 on the other end. We try to stay away from dev-only commits without a small and precise fix at hand.

fichtner avatar Oct 18 '25 08:10 fichtner

Last time I checked these were not cherry-picked to stable/14 on the other end. We try to stay away from dev-only commits without a small and precise fix at hand.

Actually, these changes have already been cherry-picked to stable/14:

dd8b6e2c

  • cherry-picked as b7a3a3a (sctp, tcp, udp: improve deferred computation of checksums)

ffd956a

  • cherry-picked as a2cbb41 (dhclient: improve UDP checksum handling)

So these commits are already part of the stable/14 branch.

Jimbambuli avatar Oct 22 '25 03:10 Jimbambuli

@Jimbambuli thanks for noting. As I said I haven't looked at it since the last cherry pick so I'm confident the next round of cherry picks will pick it up. We're releasing 25.7.5 today and since it will not ship a new kernel the cherry pick timing can be spread out to 25.7.6 and possibly a test kernel later thus week here if you want.

fichtner avatar Oct 22 '25 05:10 fichtner

Thanks, @fichtner. I’d really appreciate a test kernel later this week to check if the fix works as expected. I’m also curious to see how the performance turns out.

Jimbambuli avatar Oct 23 '25 05:10 Jimbambuli

@Jimbambuli latest master as of https://github.com/opnsense/src/commits/64f4e2bdf5b336899b2

# opnsense-update -zbkr 25.7.5-next

Cheers, Franco

fichtner avatar Oct 23 '25 08:10 fichtner

2927ebde300190c may be a crash/panic candidate in ip_tryforward()

fichtner avatar Oct 23 '25 15:10 fichtner

Revoked "25.7.5-next" kernel and will republish a new build soon.

fichtner avatar Oct 23 '25 15:10 fichtner

New base/kernel is based on https://github.com/opnsense/src/commits/714aab4

# opnsense-update -zbkr 25.7.5-next2

Cheers, Franco

fichtner avatar Oct 23 '25 20:10 fichtner

Runnig the kernel "25.7.5-next2" on N100 baremetal for few days now and it performs very well. No crashes, no weirdness, performance on i226-V is very solid.

Device: N100
OS: OPNsense
Spec Code: Intel S2363L58/SRKTU
Device: 0x125c
Subvendor: 0x8086
Part #: I226-V rev.4
Queues: 4 Rx/Tx

Regards, S.

SeimusS avatar Oct 27 '25 10:10 SeimusS

@SeimusS great, thanks for the feedback ❤️

fichtner avatar Oct 27 '25 10:10 fichtner

No problems either on an opnsense virtual machine (Proxmox) with an Intel I210 Gigabit network card.

cosmedd avatar Oct 27 '25 14:10 cosmedd

For me, it’s also running without any problems. My setup is as follows: OPNsense VM on Proxmox with 2 vtnet network interfaces. Hardware offload (rxcsum and txcsum) is enabled and working.

Jimbambuli avatar Oct 28 '25 08:10 Jimbambuli

Thanks all. One thing that I wonder about: how has performance increased?

Cheers, Franco

fichtner avatar Oct 28 '25 09:10 fichtner

Cant comment on the Virtualised front, but as far as Baremetal goes;

When testing with iperf3 on a 2x1G LAGG, it seems that the throughput across the ports is more consistent. Not sure if its the new Kernel doing or something related to meddling with my tunables....

Regards, S.

SeimusS avatar Oct 28 '25 10:10 SeimusS