kvm-guest-drivers-windows icon indicating copy to clipboard operation
kvm-guest-drivers-windows copied to clipboard

[virtio-fs] Suspected memory leak

Open SimonFair opened this issue 1 year ago • 71 comments

Describe the bug This is a user report received on the Unraid Forums.

To Reproduce Running VM with virtiofs mappings

Expected behavior No Memory leakage.

Screenshots image image

Host:

  • Disto: Unraid/Opensuse
  • 6.1.63/Unknown
  • QEMU version 7.2/8.1.2
  • libvirt version 8.7/Unknown

VM:

  • Windows version 10/11
  • Which driver has a problem virtiofs or winfsp
  • Driver version or commit hash that was used to build the driver see images

Additional context Mmdi is found in pooltag.txt so you actually have to use xperf and wpa for further debug. Following the method described there, I captured a snapshot of memory growth, opened it in wpa, loaded symbols, and expanded the Mmdi pool to find stack references to winfsp-x64.dll and virtiofs.exe. So there's the smoking gun, one of these drivers is the culprit.

I upgraded to the latest versions of WinFSP (2.0) and Virtio-win guest tools (0.1.240) and the leak is still active.

SimonFair avatar Nov 29 '23 20:11 SimonFair

I've experienced this exact non-paged pool leak under Windows 11 using the latest released Virtio and WinFSP drivers. I've also tested with the latest released rust virtiofsd. When transferring files the non paged pool grows and the memory is never released. Looking with Poolmon it's always the Mmdi tag ballooning in memory usage.

mackid1993 avatar Nov 29 '23 20:11 mackid1993

I have been running a Windows 10 guest under KVM for about 1 year now, and I have experienced this memory leak for the entire time. I am passing through a GPU and a PCIE USB card, and I also mount a few host folders on the guest using VirtioFS. When there is heavy disk I/O on these VioFS folders, the RAM usage of the guest starts increasing rapidly until it eventually reaches 100% and runs out of swap and then crashes. The rate at which the RAM depletes varies with the amount of disk I/O on the VioFS folders. In the worst case (when backup program is running scanning all files), the RAM usage increases about 1M per second and the crash occurs in about 4 hours (16G of RAM allocated to guest). In order for the backup to complete, I have to reboot the guest multiple times to avoid the system crashing.

I found multiple sources that mentioned KVM memory ballooning causes memory leaks when used in combination with GPU passthrough. I set in the XML config and I disabled blnsvr.exe on the guest. This did not help.

I also tried disabling the virtio serial device. This also did nothing.

I followed Microsoft's guide to track down kernel memory leaks using poolmon. The memory is going to a Non-paged pool with the tag "Mmdi" and the description "MDLs for physical memory allocation".

I provided the debug info in the screenshot above, tracing the Mmdi pool growth to either virtiofs.exe or winfsp-x64.dll. I can assist with any further debug information required.

christophocles avatar Nov 30 '23 06:11 christophocles

@SimonFair Thank you for your report. Maybe I am missing something, but where is the growth? BTW: to be a leak, the growth need to be consistent (not jumps, that might be related to temporary allocations).

Can you show "before" and "after" allocation counts? Also what's the amount of memory actually allocated?

I any case, worth investigation.

YanVugenfirer avatar Nov 30 '23 11:11 YanVugenfirer

@YanVugenfirer The growth is consistent while a transfer is occuring. It's in the non paged pool. It can be observed with Poolmon and looking at the Mmdi tag as @christophocles explained. When using backup software it grows extremely quickly and will keep growing until it runs out of memory. Stopping the transfer will not free any memory. Only rebooting the VM will.

mackid1993 avatar Nov 30 '23 15:11 mackid1993

@YanVugenfirer here's a screenshot of poolmon showing the kernel memory pool usage after a few hours. The nonpaged pool Mmdi grows continuously grows, unbounded, until the system crashes. The growth is accelerated when there is a lot of disk read/write activity on the virtiofs shares. The list is sorted by bytes allocated, and Mmdi is the highest with 4.2 GB.

poolmon mmdi

And here is poolmon immediately after rebooting the guest. Mmdi is only 4.6 MB.

poolmon mmdi fresh boot

Here is another capture of the Mmdi growth using xperf and wpa. This capture is 6 minutes, with 225MB of memory allocations.

wpa mmdi capture 6min

I am not sure if this bug report should going to this project or to WinFSP. Both seem to be involved with the Mmdi allocations.

christophocles avatar Dec 01 '23 23:12 christophocles

@christophocles Thanks a lot! We will take a look and investigate.

YanVugenfirer avatar Dec 03 '23 08:12 YanVugenfirer

I tried to reproduce this issue with the latest rust virtiofsd and virtio driver(242) on Win11 guest, but didn't reproduce it.

  1. Mounted one virtiofs shared dir.
  2. Run fio in the shared dir. C:\Program Files (x86)\fio\fio\fio.exe" --name=stress --filename=Z:/test_file --ioengine=windowsaio --rw=write --direct=1 --size=1G --iodepth=256 --numjobs=128 --runtime=180000 --thread --bs=64k
  3. Monitor with poolmon.exe, but there was no memory leak. poolman-fio2

@SimonFair Could you share what the IO operation in your env? Thanks in advance.

xiagao avatar Dec 05 '23 10:12 xiagao

@SimonFair are you using Rust virtiofsd?

YanVugenfirer avatar Dec 05 '23 11:12 YanVugenfirer

I've tested this on rust Virtiofsd under Unraid and had the Mmdi leak. Perhaps @christophocles has more insight. I believe he used a different distro per our conversation on the Unraid forums and may be able to share what occured on that platform.

mackid1993 avatar Dec 05 '23 15:12 mackid1993

@xiagao The latest drivers we were able to get were .240. How do we test with .242? Can you provide a binary for us to test with?

mackid1993 avatar Dec 05 '23 15:12 mackid1993

@SimonFair are you using Rust virtiofsd?

@YanVugenfirer The bug report originated from my system, and others on the Unraid forums have reported the same issue. Yes, I am using Rust virtiofsd 1.7.2 which is the version currently packaged on openSUSE Tumbleweed.

@xiagao I am also using virtio-win driver version 0.240 since that is the latest binary release. I have visual studio and driver toolkits installed so my environment set up to compile newer drivers from source, if needed for testing. Tonight I will spin up a fresh Win10 VM and try to reproduce the leak again myself, with minimum required steps. It's possible that other features my specific system are interacting to trigger the memory leak (i.e. PCI-E passthrough?). If I am able to successfully reproduce the leak on a new VM, I will post detailed steps to reproduce.

christophocles avatar Dec 05 '23 16:12 christophocles

@christophocles @SimonFair

@YanVugenfirer The bug report originated from my system, and others on the Unraid forums have reported the same issue. Yes, I am using Rust virtiofsd 1.7.2 which is the version currently packaged on openSUSE Tumbleweed.

The latest Rust virtiofsd 1.8.0. Please try it.

kostyanf14 avatar Dec 05 '23 16:12 kostyanf14

@kostyanf14 I ran virtiofsd 1.8.0 on Unraid and ran into the same memory leak.

mackid1993 avatar Dec 05 '23 16:12 mackid1993

2. C:\Program Files (x86)\fio\fio\fio.exe" --name=stress --filename=Z:/test_file --ioengine=windowsaio --rw=write --direct=1 --size=1G --iodepth=256 --numjobs=128 --runtime=180000 --thread --bs=64k

Where is fio.exe? I only have C:\Program Files\Virtio-Win\VioFS\virtiofs.exe

mackid1993 avatar Dec 05 '23 17:12 mackid1993

  1. C:\Program Files (x86)\fio\fio\fio.exe" --name=stress --filename=Z:/test_file --ioengine=windowsaio --rw=write --direct=1 --size=1G --iodepth=256 --numjobs=128 --runtime=180000 --thread --bs=64k

Where is fio.exe? I only have C:\Program Files\Virtio-Win\VioFS\virtiofs.exe

Hi, you can find fio binary in https://fio.readthedocs.io/en/latest/fio_doc.html .

xiagao avatar Dec 06 '23 01:12 xiagao

@kostyanf14 I ran virtiofsd 1.8.0 on Unraid and ran into the same memory leak. Could you share what io test did you do on the shared folder? I also will try some other tools, such as iozone and iometers.

xiagao avatar Dec 06 '23 01:12 xiagao

What always does it for me is a free trial of Backblaze Personal Backup and letting it back up my large media library stored on a VirtioFS mount. That will cause Mmdi to grow very quickly.

mackid1993 avatar Dec 06 '23 01:12 mackid1993

I should also add I use this batch script to mount several Unraid shares as different drive letters:

"C:\Program Files (x86)\WinFsp\bin\launchctl-x64.exe" start virtiofs viofsJ Tag1 J: "C:\Program Files (x86)\WinFsp\bin\launchctl-x64.exe" start virtiofs viofsl Tag2 l: "C:\Program Files (x86)\WinFsp\bin\launchctl-x64.exe" start virtiofs viofsM Tag3 m: "C:\Program Files (x86)\WinFsp\bin\launchctl-x64.exe" start virtiofs viofsS Tag4 s: "C:\Program Files (x86)\WinFsp\bin\launchctl-x64.exe" start virtiofs viofsT Tag5 T:

I previously ran: C:\Program Files (x86)\WinFsp\bin\fsreg.bat" virtiofs "C:\Program Files\Virtio-Win\VioFS\virtiofs.exe" "-t %%1 -m %%2"

mackid1993 avatar Dec 06 '23 01:12 mackid1993

Has anyone been able to repro this?

mackid1993 avatar Dec 09 '23 21:12 mackid1993

I reproduced this issue with multiple source mapping from host to Win11 guest. Using IOmeter software to create a lot of disk read/write activity on the virtiofs shares. Here are some screenshots showing nonpaged pool Mmdi grows continuously after starting IO test and the memory isn't released after stop IO test. mmdi mmdi2 mmdi4 mmdi5

xiagao avatar Dec 10 '23 06:12 xiagao

@xiagao I'm glad it's not just us! Thank you for your effort. So hopefully this can eventually be fixed!

mackid1993 avatar Dec 10 '23 18:12 mackid1993

@xiagao I'm glad it's not just us! Thank you for your effort. So hopefully this can eventually be fixed! No problem.
Thanks for reporting this issue.

xiagao avatar Dec 11 '23 01:12 xiagao

Thank you! Can't wait to finally use Virtiofs.

mackid1993 avatar Dec 11 '23 15:12 mackid1993

Is there a fix for the issue or has the root cause been found?

SimonFair avatar Dec 20 '23 07:12 SimonFair

@SimonFair not yet. Due to holidays time we are not yet got to debug it.

YanVugenfirer avatar Dec 20 '23 13:12 YanVugenfirer

Just came to report my issue with the memory leaks too.

Running a win11 VM for security cameras writing about 50mbps constantly over virtiofs will chew up my 16gb allocated ram in about 24 hours.

Hope your team had a good holiday period and will look back into this in coming weeks / months for any updates.

starlit-rocketship avatar Jan 08 '24 14:01 starlit-rocketship

It'll be great to know if any progress has been made on this bug?

mackid1993 avatar Jan 09 '24 01:01 mackid1993

@mackid1993 No progress due to the holiday season

YanVugenfirer avatar Jan 09 '24 06:01 YanVugenfirer

@kostyanf14 and I found an issue that caused the memory leak (hopefully the only one). Soon CI will build the driver that can be tested if anyone is interested.

YanVugenfirer avatar Jan 10 '24 11:01 YanVugenfirer

@YanVugenfirer Can you please provide a link to the driver once it's been built? Thank you!

mackid1993 avatar Jan 10 '24 16:01 mackid1993