kvm-guest-drivers-windows icon indicating copy to clipboard operation
kvm-guest-drivers-windows copied to clipboard

[vioscsi] Fix SendSRB regression and refactor for optimum performance [viostor] Backport SendSRB improvements

Open benyamin-codez opened this issue 6 months ago • 40 comments

Fixes regression in SendSRB of [vioscsi]

Related issues: #756 #623 #907 ... [likely others]

Regression was introduced in #684 between https://github.com/virtio-win/kvm-guest-drivers-windows/commit/7dc052d55057c2a4dc2f64639d2f5ad6de34c4af and https://github.com/virtio-win/kvm-guest-drivers-windows/commit/fdf56ddfb44028b5295d2bdf20a30f5d90b2f13e. Specifically due to commits https://github.com/virtio-win/kvm-guest-drivers-windows/commit/f1338bb7edd712bb9c3cf9c9b122e60a3f518408 and https://github.com/virtio-win/kvm-guest-drivers-windows/commit/fdf56ddfb44028b5295d2bdf20a30f5d90b2f13e (considered together).

PR includes:

  1. Send SRB fix
  2. Refactoring for optimum performance
  3. WPP instrumentation improvements
  4. SendSRB improvements backported into [viostor]
  5. Minor cleanup

The WPP improvements include:

  1. Macro optimisation
  2. Exit fn() insurance
  3. Prettify trace output
  4. New macros for FORCEDINLINE functions

Look out for 1x TODO + 2x FIXME... RFC please.

Tested on:

  • QEMU emulator version 9.0.2 (pve-qemu-kvm_9.0.2-2) ONLY
  • Linux 6.8.12-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-1 (2024-08-05T16:17Z) x86_64 GNU/Linux
  • Hypervisor: Intel-based ONLY
  • Guest: WindowsServer2019 (Win10 build) ONLY
  • Directory backing ONLY
  • Promox "VirtIO SCSI Single" with iothreads (multiple HBAs) was used mostly
  • Limited testing done on single HBA with multiple LUNs (iothread=0)
  • AIO = threads ONLY - UNTESTED on io_uring or native AIO (suspect this will fix issues here)
  • Local storage backings (raw and qcow) ONLY, untested on networked backings

YMMV with other platforms and AIO implementations (but I suspect they will be fine).

Performance is significantly higher, even when compared to version 100.85.104.20800 in virtio-win package 0.1.208. I suspect this is mainly because this fix unlocks other improvements made when the regression was introduced.

Occasional throughput of 12+GB/s were achieved in guest (64KiB blocks, 32 outstanding queues and 16 threads).

Sequential reads performed at just below the backing storage max throughput at 7GB/s (1MiB blocks, 8 outstanding queues and 1 thread). Random reads came in at 1.1GB/s, 270K IOPS at 1850 µs latency for (4KiB blocks, 32 outstanding queues and 16 threads). Write performance was approximately 93% of read performance.

I tested on a variety of local storage mediums but the above numbers are for a 7.3GB/s Read / 6GB/s Write 1M IOPS R/W NVMe SSD. So throughput was as expected but IOPS were only ~27%.... It will be interesting to see what numbers come up on a decently sized NVMe RAID0/10 JBOD...! Certainly beware of running on old crusty spindles..! 8^d

It is also worth mentioning that the ETW overhead when running an active trace with all flags set is about 8%.

Freedom for Windows guests once held captive...! 8^D

cc: @vrozenfe @JonKohler @sb-ntnx @frwbr @foxmox

Related external issues (at least confounded by this regression):

https://bugzilla.proxmox.com/show_bug.cgi?id=4295 https://bugzilla.proxmox.com/show_bug.cgi?id=4295 https://bugzilla.kernel.org/show_bug.cgi?id=199727 ... [likely others]

benyamin-codez avatar Aug 20 '24 03:08 benyamin-codez