kvm-guest-drivers-windows
kvm-guest-drivers-windows copied to clipboard
[vioscsi] Fix SendSRB regression and refactor for optimum performance [viostor] Backport SendSRB improvements
Fixes regression in SendSRB of [vioscsi]
Related issues: #756 #623 #907 ... [likely others]
Regression was introduced in #684 between https://github.com/virtio-win/kvm-guest-drivers-windows/commit/7dc052d55057c2a4dc2f64639d2f5ad6de34c4af and https://github.com/virtio-win/kvm-guest-drivers-windows/commit/fdf56ddfb44028b5295d2bdf20a30f5d90b2f13e. Specifically due to commits https://github.com/virtio-win/kvm-guest-drivers-windows/commit/f1338bb7edd712bb9c3cf9c9b122e60a3f518408 and https://github.com/virtio-win/kvm-guest-drivers-windows/commit/fdf56ddfb44028b5295d2bdf20a30f5d90b2f13e (considered together).
PR includes:
- Send SRB fix
- Refactoring for optimum performance
- WPP instrumentation improvements
- SendSRB improvements backported into [viostor]
- Minor cleanup
The WPP improvements include:
- Macro optimisation
- Exit fn() insurance
- Prettify trace output
- New macros for FORCEDINLINE functions
Look out for 1x TODO + 2x FIXME... RFC please.
Tested on:
-
QEMU emulator version 9.0.2 (pve-qemu-kvm_9.0.2-2) ONLY
-
Linux 6.8.12-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-1 (2024-08-05T16:17Z) x86_64 GNU/Linux
- Hypervisor: Intel-based ONLY
- Guest: WindowsServer2019 (Win10 build) ONLY
- Directory backing ONLY
- Promox "VirtIO SCSI Single" with iothreads (multiple HBAs) was used mostly
- Limited testing done on single HBA with multiple LUNs (
iothread=0
) - AIO =
threads
ONLY - UNTESTED onio_uring
ornative
AIO (suspect this will fix issues here) - Local storage backings (
raw
andqcow
) ONLY, untested on networked backings
YMMV with other platforms and AIO implementations (but I suspect they will be fine).
Performance is significantly higher, even when compared to version 100.85.104.20800 in virtio-win package 0.1.208. I suspect this is mainly because this fix unlocks other improvements made when the regression was introduced.
Occasional throughput of 12+GB/s were achieved in guest (64KiB blocks, 32 outstanding queues and 16 threads).
Sequential reads performed at just below the backing storage max throughput at 7GB/s (1MiB blocks, 8 outstanding queues and 1 thread). Random reads came in at 1.1GB/s, 270K IOPS at 1850 µs latency for (4KiB blocks, 32 outstanding queues and 16 threads). Write performance was approximately 93% of read performance.
I tested on a variety of local storage mediums but the above numbers are for a 7.3GB/s Read / 6GB/s Write 1M IOPS R/W NVMe SSD. So throughput was as expected but IOPS were only ~27%.... It will be interesting to see what numbers come up on a decently sized NVMe RAID0/10 JBOD...! Certainly beware of running on old crusty spindles..! 8^d
It is also worth mentioning that the ETW overhead when running an active trace with all flags set is about 8%.
Freedom for Windows guests once held captive...! 8^D
cc: @vrozenfe @JonKohler @sb-ntnx @frwbr @foxmox
Related external issues (at least confounded by this regression):
https://bugzilla.proxmox.com/show_bug.cgi?id=4295 https://bugzilla.proxmox.com/show_bug.cgi?id=4295 https://bugzilla.kernel.org/show_bug.cgi?id=199727 ... [likely others]