kvm-guest-drivers-windows
kvm-guest-drivers-windows copied to clipboard
[vioscsi] - Windows 10 "Optimize drive"/Trim/Discard causes all data to be rewritten
I'm using the virtio-win-0.1.208.iso version of vioscsi on Windows 10 x64 21H1 in a libvirt+KVM+QEMU combo.
Running the trim command causes a high cpu load, runs for an absurd time (bare metal installation runs for ~5 sec vs 10-15 min for the vioscsi version) and it causes all the data to be rewritten again instead of just trimming the drive (can be seen in iotop, iostat and in S.M.A.R.T. Lifetime writes) which is less than ideal for a ssd. Linux guests do not suffer from this problem.
Passing the device.scsi0-0-0-0.rotation_rate=1 aka. SSD emulation argument to QEMU doesn't have an impact.
A similar problem can be seen here https://forum.level1techs.com/t/win10-optimize-drives-makes-underlying-sparse-storage-grow-not-shrink/172803
Is this perhaps a Windows defrag.exe issue?
Forgot to say that when defrag.exe is running it uses one core and 2,170MB of memory, and Task Manager shows that there is a lot of disk write activity.
Resource Monitor shows a lot of write activity to C:$LogFile by defragsvc and Registry.
@milsav92
Thanks a lot for reporting this issue. Can you please post the qemu command line along with qemu version? Hopefully this this information will help us to reproduce the problem.
Just for the record, virtio-scsi driver for Windows is fully relying on qemu code and doesn't intercept (unlike virtio-blk viostor driver) SCSIOP_UNMAP request. In this term it is quite interesting if virtio-blk has the same problem.
Best, Vadim.
PC:
QEMU emulator version 6.1.0 (openSUSE Tumbleweed) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
/usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-12-win10/master-key.aes"} -machine pc-q35-6.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram -cpu EPYC-Milan,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,vaes=on,vpclmulqdq=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,cmp-legacy=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,pcid=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff -m 16384 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":17179869184} -overcommit mem-lock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 3f5b5ce9-4fab-40b9-bee9-84bb2f8c26c0 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 -blockdev {"driver":"file","filename":"/mnt/vms/images/win10.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"} -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-3-format,id=scsi0-0-0-0,bootindex=1,write-cache=on -blockdev {"driver":"file","filename":"/mnt/vms/images/win10-1.img","node-name":"libvirt-2-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"} -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,device_id=drive-scsi0-0-0-2,drive=libvirt-2-format,id=scsi0-0-0-2,write-cache=on -device ide-cd,bus=ide.1,id=sata0-0-1 -netdev tap,fd=33,id=hostnet0 -device e1000e,netdev=hostnet0,id=net0,mac=52:54:00:70:cc:72,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev id=audio1,driver=spice -spice port=5900,addr=127.0.0.1,disable-ticketing=on,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0,audiodev=audio1 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Proxmox:
QEMU emulator version 6.0.0 (pve-qemu-kvm_6.0.0) Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
/usr/bin/kvm -id 100 -name win10 -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=f0085f4a-19a2-411a-8934-0452afcb16b0 -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/100.vnc,password=on -no-hpet -cpu host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt -m 8192 -object iothread,id=iothread-virtioscsi0 -object iothread,id=iothread-virtioscsi1 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device vmgenid,guid=96814a7f-f7ea-470a-894a-500177c6bf00 -device qxl-vga,id=vga,bus=pcie.0,addr=0x1 -chardev socket,path=/var/run/qemu-server/100.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000,bus=pci.1,addr=0x1d -device virtio-serial,id=spice,bus=pci.0,addr=0x9 -chardev spicevmc,id=vdagent,name=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice tls-port=61000,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:22d334eb194 -drive if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/images/100/vm-100-disk-0.qcow2,if=none,id=drive-scsi0,cache=none,werror=stop,discard=on,format=qcow2,aio=io_uring,detect-zeroes=unmap -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -device virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2,iothread=iothread-virtioscsi1 -drive file=/images/100/vm-100-disk-1.qcow2,if=none,id=drive-scsi1,cache=writeback,werror=stop,discard=on,format=qcow2,aio=io_uring,detect-zeroes=unmap -device scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=DE:3A:FE:C8:D1:B5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102 -rtc driftfix=slew,base=localtime -machine type=pc-q35-6.0+pve0 -global kvm-pit.lost_tick_policy=discard
Configurations:
-
PC - libvirt:
<domain type="kvm"> <name>win10</name> <uuid>3f5b5ce9-4fab-40b9-bee9-84bb2f8c26c0</uuid> <metadata> <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> <libosinfo:os id="http://microsoft.com/win/10"/> </libosinfo:libosinfo> </metadata> <memory unit="KiB">16777216</memory> <currentMemory unit="KiB">16777216</currentMemory> <vcpu placement="static">8</vcpu> <os> <type arch="x86_64" machine="pc-q35-6.1">hvm</type> <boot dev="hd"/> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state="on"/> <vapic state="on"/> <spinlocks state="on" retries="8191"/> </hyperv> <vmport state="off"/> </features> <cpu mode="host-model" check="partial"/> <clock offset="localtime"> <timer name="rtc" tickpolicy="catchup"/> <timer name="pit" tickpolicy="delay"/> <timer name="hpet" present="no"/> <timer name="hypervclock" present="yes"/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled="no"/> <suspend-to-disk enabled="no"/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type="file" device="disk"> <driver name="qemu" type="raw" cache="none" discard="unmap" detect_zeroes="unmap"/> <source file="/mnt/vms/images/win10.img"/> <target dev="sda" bus="scsi"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> <disk type="file" device="disk"> <driver name="qemu" type="raw" cache="writeback" discard="unmap" detect_zeroes="unmap"/> <source file="/mnt/vms/images/win10-1.img"/> <target dev="sdc" bus="scsi"/> <address type="drive" controller="0" bus="0" target="0" unit="2"/> </disk> <disk type="file" device="cdrom"> <driver name="qemu" type="raw"/> <target dev="sdb" bus="sata"/> <readonly/> <address type="drive" controller="0" bus="0" target="0" unit="1"/> </disk> <controller type="usb" index="0" model="qemu-xhci" ports="15"> <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/> </controller> <controller type="scsi" index="0" model="virtio-scsi"> <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/> </controller> <controller type="sata" index="0"> <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/> </controller> <controller type="pci" index="0" model="pcie-root"/> <controller type="pci" index="1" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="1" port="0x10"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/> </controller> <controller type="pci" index="2" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="2" port="0x11"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/> </controller> <controller type="pci" index="3" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="3" port="0x12"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/> </controller> <controller type="pci" index="4" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="4" port="0x13"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/> </controller> <controller type="pci" index="5" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="5" port="0x14"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/> </controller> <controller type="pci" index="6" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="6" port="0x15"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/> </controller> <controller type="virtio-serial" index="0"> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </controller> <interface type="network"> <mac address="52:54:00:70:cc:72"/> <source network="default"/> <model type="e1000e"/> <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/> </interface> <serial type="pty"> <target type="isa-serial" port="0"> <model name="isa-serial"/> </target> </serial> <console type="pty"> <target type="serial" port="0"/> </console> <channel type="spicevmc"> <target type="virtio" name="com.redhat.spice.0"/> <address type="virtio-serial" controller="0" bus="0" port="1"/> </channel> <input type="tablet" bus="usb"> <address type="usb" bus="0" port="1"/> </input> <input type="mouse" bus="ps2"/> <input type="keyboard" bus="ps2"/> <graphics type="spice" autoport="yes"> <listen type="address"/> <image compression="off"/> </graphics> <sound model="ich9"> <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/> </sound> <audio id="1" type="spice"/> <video> <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/> </video> <redirdev bus="usb" type="spicevmc"> <address type="usb" bus="0" port="2"/> </redirdev> <redirdev bus="usb" type="spicevmc"> <address type="usb" bus="0" port="3"/> </redirdev> <memballoon model="virtio"> <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/> </memballoon> </devices> </domain>
-
Proxmox: agent: 1 boot: order=scsi0;ide2;net0 cores: 2 cpu: host ide2: none,media=cdrom machine: pc-q35-6.0 memory: 8192 name: win10 net0: virtio=DE:3A:FE:C8:D1:B5,bridge=vmbr0,firewall=1 numa: 0 ostype: win10 protection: 1 rng0: source=/dev/urandom scsi0: vms:100/vm-100-disk-0.qcow2,cache=none,discard=on,iothread=1,replicate=0,size=64G,werror=stop scsi1: vms:100/vm-100-disk-1.qcow2,cache=writeback,discard=on,iothread=1,replicate=0,size=16G,werror=stop scsihw: virtio-scsi-single smbios1: uuid=f0085f4a-19a2-411a-8934-0452afcb16b0 sockets: 1 vga: qxl vmgenid: 96814a7f-f7ea-470a-894a-500177c6bf00
I will ask QE to reproduce and analyse this issue.
Thanks,
Vadim.
It seems to be a defrag problem, Win8.1 doesn't show the same symptoms.
Win 8.1
- SMART before defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7813
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11058
- Defrag Pay attention to the Retrim: block
PS C:\Windows\system32> defrag e: /u /v /h /o
Microsoft Drive Optimizer
Copyright (c) 2013 Microsoft Corp.
Invoking slab consolidation on SCSI (E:)...
Slab Analysis: 100% complete.
Retrim: 100% complete.
Slab consolidation was skipped because there were few evictable slabs.
The operation completed successfully.
Post Defragmentation Report:
Volume Information:
Volume size = 19.99 GB
Cluster size = 4 KB
Used space = 5.14 GB
Free space = 14.84 GB
Slab Consolidation:
Space efficiency = 100%
Potential purgable slabs = 0
Slabs pinned unmovable = 0
Successfully purged slabs = 0
Recovered space = 0 bytes
Retrim:
Backed allocations = 19
Allocations trimmed = 15
Total space trimmed = 12.11 GB
- SMART after defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7813
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11058
- Defrag runtime
PS C:\Windows\system32> Measure-Command { defrag e: /h /o }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 665
Ticks : 6658888
TotalDays : 7.7070462962963E-06
TotalHours : 0.000184969111111111
TotalMinutes : 0.0110981466666667
TotalSeconds : 0.6658888
TotalMilliseconds : 665.8888
- iostat
root@PC:~> iostat -mh -d 60 sda4
Linux 5.14.11-2-default (PC) 10/25/2021 _x86_64_ (32 CPU)
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
1.54 4.6k 886.0k 0.0k 70.6M 13.4G 0.0k sda4
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
0.83 0.0k 127.1k 0.0k 0.0k 7.4M 0.0k sda4
Win 10
- SMART before defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7813
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11058
- Defrag Pay attention to the Retrim: block
PS C:\Windows\system32> defrag e: /u /v /h /o
Invoking slab consolidation on SCSI (E:)...
Slab Analysis: 100% complete.
Performing pass 1:
Retrim: 0% complete...
Slab consolidation was skipped because there were few evictable slabs.
Retrim: 100% complete.
The operation completed successfully.
Post Defragmentation Report:
Volume Information:
Volume size = 19.99 GB
Cluster size = 4 KB
Used space = 5.14 GB
Free space = 14.85 GB
Allocation Units:
Slab count = 5242111
Slab size = 4 KB
Slab alignment = 0 bytes
In-use slabs = 1347760
Slab Consolidation:
Space efficiency = 100%
Potential purgable slabs = 0
Slabs pinned unmovable = 0
Successfully purged slabs = 0
Recovered space = 0 bytes
Retrim:
Backed allocations = 5242111
Allocations trimmed = 3894341
Total space trimmed = 14.85 GB
- SMART after defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7815
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11058
- Defrag runtime
PS C:\Windows\system32> Measure-Command { defrag e: /h /o }
Days : 0
Hours : 0
Minutes : 0
Seconds : 48
Milliseconds : 748
Ticks : 487484326
TotalDays : 0.000564217969907407
TotalHours : 0.0135412312777778
TotalMinutes : 0.812473876666667
TotalSeconds : 48.7484326
TotalMilliseconds : 48748.4326
- iostat
root@PC:~> sudo iostat -mh -d 60 sda4
Linux 5.14.11-2-default (PC) 10/25/2021 _x86_64_ (32 CPU)
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
1.48 4.4k 849.7k 0.0k 70.7M 13.4G 0.0k sda4
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
41.97 0.0k 24.4M 0.0k 0.0k 1.4G 0.0k sda4
Win 11 and Windows Server 2022 display the same symptoms as Win 10 so it seem that somewhere in the transition from W8.1 to W10 Microsoft changed something in defrag.exe.
If it's a defrag.ex regression do you have an idea how to report the problem to Microsoft?
@milsav92 Thaks a lot for your update. Our QE should be testing exactly the same platforms - Win10 21H1/Win11 and WS2022 right now. Honestly, I don't think that this is the MS problem. I rather think that our drivers are missing some bits required by recent Windows versions.
We will try to investigate the problem, but it might take us some time.
Best, Vadim.
Hi @vrozenfe @milsav92,
We tried some tests for this issue, We can reproduce this issue, but it seems the defrag version also will impact the trim result, details as follows:
Tested with 3 different type data disk(blk, scsi, ide):
Commands with blk data disk:
-blockdev node-name=file_stg1,driver=file,cache.direct=on,cache.no-flush=off,filename=/home/stgtest.qcow2,aio=threads,discard=unmap
-blockdev node-name=drive_stg1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_stg1,discard=unmap
-device virtio-blk-pci,id=stg1,drive=drive_stg1,bus=pci.2
-device virtio-net-pci,mac=9a:36:83:b6:3d:05,id=idJVpmsF,netdev=id23ZUK6,bus=pci.3 \
Commands with scsi data disk:
-device virtio-scsi-pci,id=scsi1,bus=pci.4,addr=0x0
-blockdev driver=file,filename=/home/stgtest.qcow2,node-name=libvirt-1-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap
-blockdev node-name=libvirt-1-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-1-storage
-device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-1,drive=libvirt-1-format,id=scsi0-0-0-1,write-cache=on \
Commands with ide data disk:
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/stgtest.qcow2,node-name=my_file1,aio=threads,discard=unmap
-blockdev driver=qcow2,node-name=my1,file=my_file1,cache.direct=on,cache.no-flush=off,discard=unmap
-device ide-hd,drive=my1,id=ide0-0-1,bus=ide.1,unit=0 \
Test results:
-
ide/blk system disk + blk data disk + defrag d: /u /v /h /o, we can reproduce this issue, cpu usage will up to 60%, and Retrim complete need 10-15mins.
-
ide system disk + scsi data disk + defrag d: /u /v /h /o, we can reproduce this issue, cpu usage will up to 60%, and Retrim complete need 10-15 mins.
-
ide system disk + ide data disk + defrag d: /u /v /h /o, we cannot reproduce this issue, Retrim complete fast, about a few seconds.
Change another win10 image, when tab defrag command in cmd, will auto call Defrag.exe program, so tried test with Defrag.exe, the result has some changes, as follows: 4) ide system disk + blk data disk + Defrag.exe d: /u /v /h /o, cpu usage will up to 60%, but Retrim completed in 2 mins.
-
ide system disk + scsi data disk + Defrag.exe d: /u /v /h /o, cpu usage will up to 60%, but Retrim completed in 2 mins.
-
ide system disk + ide data disk + Defrag.exe d: /u /v /h /o, Retrim complete fast, about a few seconds as well.
We downloaded the smartmontools-7.2, but hit some problems when installing it in windows, so we has not smartctl check, just record the Retrim completed time.
Used versions: kernel-5.14.0-7.el9.x86_64 qemu-kvm-6.1.0-5.el9.x86_64 seabios-bin-1.14.0-6.el9.noarch virtio-win-prewhql-208/214
Qemu command lines(just paste one configure):
/usr/libexec/qemu-kvm
-name 'avocado-vt-vm3'
-machine q35
-nodefaults
-vga std
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x3
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x3.0x1
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x3.0x2
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x3.0x3
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x3.0x4
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x3.0x5
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x3.0x6
-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x3.0x7
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/kvm_autotest_root/images/win10-64-virtio-scsi.qcow2,node-name=my_file,aio=threads,discard=unmap
-blockdev driver=qcow2,node-name=my,file=my_file,cache.direct=on,cache.no-flush=off,discard=unmap
-device ide-hd,drive=my,id=ide0-0-0,bus=ide.0,unit=0,bootindex=0
-device virtio-scsi-pci,id=scsi1,bus=pci.4,addr=0x0
-blockdev driver=file,filename=/home/stgtest.qcow2,node-name=libvirt-1-storage,cache.direct=on,cache.no-flush=off,auto-read-only=on,discard=unmap
-blockdev node-name=libvirt-1-format,read-only=off,discard=unmap,detect-zeroes=unmap,cache.direct=on,cache.no-flush=off,driver=qcow2,file=libvirt-1-storage
-device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-1,drive=libvirt-1-format,id=scsi0-0-0-1,write-cache=on
-device virtio-net-pci,mac=9a:36:83:b6:3d:05,id=idJVpmsF,netdev=id23ZUK6,bus=pci.3
-netdev tap,id=id23ZUK6,vhost=on
-m 8192
-smp 2,maxcpus=4
-cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt
-device piix3-usb-uhci,id=usb -device usb-tablet,id=input0
-vnc :10
-rtc base=localtime,clock=host,driftfix=slew
-boot order=cdn,once=c,menu=off,strict=off
-enable-kvm
-qmp tcp:0:1231,server,nowait
-monitor stdio \
Thanks~ Peixiu
Tried more tests for this case, the results as follows:
We can reproduce this issue with both viostor and vioscsi, tested with virtio-win-prewhql-208 build, the high cpu usage and trim completed need 10-15 mins situations. Tried update driver version to virtio-win-prewhql-214, also reproduce this situation both with viostor and vioscsi.
Check the used win10 image version is Edition: Windows 10 Enterprise. Tried to re-install a new win10 image with Edition: Windows 10 Enterprise N, and create a new data image, test again, the issue cannot be reproduced, the trim completed in 2 mins, both for viostor and vioscsi.
And we also tried test on win8.1-64, the trim completed so fast, just a few seconds. And tried with a previous Windows releases win10-1909, but installed with Windows 10 Enterprise N, also cannot reproduced this issue, the trim completed in a few seconds.
Used versions: kernel-5.14.0-9.el9.x86_64 qemu-kvm-6.1.0-5.el9.x86_64 seabios-bin-1.14.0-7.el9.noarch virtio-win-prewhql-208/214
Thanks~ Peixiu
As a workarround "disabling" Thin Volume works.
To force SSD emulation (avoiding the Thin Volume) I also need to set discard_granularity=0
apart from the rotation_rate=1
already mentioned.
This way Windows sees the volume as a SSD instead of as a Thin Volume and the Optimization (trim) works in seconds.
Example configurations:
-set device.scsi0-0-0-0.rotation_rate=1
-set device.scsi0-0-0-0.discard_granularity=0
or for libvirt < 8.2:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
<!-- [...] -->
<qemu:commandline>
<qemu:arg value="-set"/>
<qemu:arg value="device.scsi0-0-0-0.rotation_rate=1"/>
<qemu:arg value="-set"/>
<qemu:arg value="device.scsi0-0-0-0.discard_granularity=0"/>
</qemu:commandline>
</domain>
or for libvirt 8.2 or newer:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
<!-- [...] -->
<qemu:override>
<qemu:device alias="scsi0-0-0-0">
<qemu:frontend>
<qemu:property name="rotation_rate" type="unsigned" value="1"/>
<qemu:property name="discard_granularity" type="unsigned" value="0"/>
</qemu:frontend>
</qemu:device>
</qemu:override>
</domain>
As a workarround "disabling" Thin Volume works. To force SSD emulation (avoiding the Thin Volume) I also need to set
discard_granularity=0
apart from therotation_rate=1
already mentioned. This way Windows sees the volume as a SSD instead of as a Thin Volume and the Optimization (trim) works in seconds.Example configurations:
-set device.scsi0-0-0-0.rotation_rate=1 -set device.scsi0-0-0-0.discard_granularity=0
or for libvirt:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm"> <!-- [...] --> <qemu:commandline> <qemu:arg value="-set"/> <qemu:arg value="device.scsi0-0-0-0.rotation_rate=1"/> <qemu:arg value="-set"/> <qemu:arg value="device.scsi0-0-0-0.discard_granularity=0"/> </qemu:commandline> </domain>
I can confirm that does solve the problem. If you don't mind me asking how did you get the idea to use discard_granularity=0?
In order to force SSD, I looked up to the QEMU source code and SCSI documentation (page 499 - 5.4.13). And found this code in the QEMU SCSI disk source code where it generates the documented response:
case 0xb2: /* thin provisioning */
{
buflen = 8;
outbuf[4] = 0;
outbuf[5] = 0xe0; /* unmap & write_same 10/16 all supported */
outbuf[6] = s->qdev.conf.discard_granularity ? 2 : 1;
outbuf[7] = 0;
break;
}
In any case this is a workaround to avoid the Windows VM to suffer lockups while optimising disks, not sure if this will actually perform the SSD TRIM or just ignore the operation.
My understanding is that this issue still needs to be addressed.
Slight update.
It seems that even with the empty volume trim writes a lot of data. For example for a 64 GB empty volume 6.1 GB are written.
- SMART before defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7969
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11070
- Defrag Pay attention to the Used Space and Retrim: block
PS C:\WINDOWS\system32> defrag f: /u /v /h /o
Invoking slab consolidation on Empty_Volume (F:)...
Slab Analysis: 0% complete...
Slab Analysis: 100% complete.
Performing pass 1:
Retrim: 0% complete...
Slab consolidation was skipped because there were few evictable slabs.
Retrim: 100% complete.
The operation completed successfully.
Post Defragmentation Report:
Volume Information:
Volume size = 63.99 GB
Cluster size = 4 KB
Used space = 94.71 MB
Free space = 63.90 GB
Allocation Units:
Slab count = 16776447
Slab size = 4 KB
Slab alignment = 0 bytes
In-use slabs = 23224
Slab Consolidation:
Space efficiency = 100%
Potential purgable slabs = 0
Slabs pinned unmovable = 0
Successfully purged slabs = 0
Recovered space = 0 bytes
Retrim:
Backed allocations = 16776447
Allocations trimmed = 16753222
Total space trimmed = 63.90 GB
- Defrag runtime
PS C:\WINDOWS\system32> Measure-Command { defrag f: /h /o }
Days : 0
Hours : 0
Minutes : 3
Seconds : 5
Milliseconds : 785
Ticks : 1857858074
TotalDays : 0.00215029869675926
TotalHours : 0.0516071687222222
TotalMinutes : 3.09643012333333
TotalSeconds : 185.7858074
TotalMilliseconds : 185785.8074
- SMART after defrag
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 7975
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 11070
- iostat
root@PC:~> iostat -mh -d 200 sdb4
Linux 5.15.8-1-default (PC) 12/26/2021 _x86_64_ (32 CPU)
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
3.18 0.8k 1.7M 0.0k 6.1M 12.3G 0.0k sdb4
tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd Device
59.72 0.0k 31.2M 0.0k 0.0k 6.1G 0.0k sdb4
Sorry for the bump, but is there any progress? The workaround seems to do the job, but it's not really easy to use for multiple VM's with multiple drives.
@milsav92 The work is still in progress. There is a bug in RH bugzilla for tracing this issue https://bugzilla.redhat.com/show_bug.cgi?id=2020998
Sorry for the inconvenience. Vadim.
Same issue on latest Proxmox + Win11, very slow (about 2 hours) NVME PCIe 4.0 disk optimization... How to add: -set device.scsi0-0-0-0.rotation_rate=1 -set device.scsi0-0-0-0.discard_granularity=0 on Proxmox systems?
@GektorUA
Same issue on latest Proxmox + Win11, very slow (about 2 hours) NVME PCIe 4.0 disk optimization... How to add: -set device.scsi0-0-0-0.rotation_rate=1 -set device.scsi0-0-0-0.discard_granularity=0 on Proxmox systems?
You can set rotation_rate=1 by enabling "SSD Emulation" checkbox on the disk.
As for -set device.scsi0-0-0-0.discard_granularity=0
you have to edit the vm config file in /etc/pve/qemu/${VMID}.conf
and add the following line
args: -set device.scsi0.discard_granularity=0
If you have more drives you can append them to the line:
args: -set device.scsi0.discard_granularity=0 -set device.scsi1.discard_granularity=0
I've just pushed the viostor related fix https://github.com/virtio-win/kvm-guest-drivers-windows/pull/824 In both case (viostor and vioscsi) setting discard_granularity to 16/32M (Hyper-V uses 32M) makes Windows to work with large slabs (clusters) which reduce the defragmentation time significantly
Below please see "defrag.exe e: /u /v /h /o" command execution time for 10G volume on Win10 21H2 system
discard_granularity 4K 32K 256K 2K 16M 32M Optimal unmap granularity 8 64 512 4096 32768 65536 virtio-blk defrag time in sec 615.61 78.77 15.48 4.29 1.43 1.22 virtio-scsi defrag time in sec 575.77 149 15.50 3.25 1.44 1.72
qemu command line
for virtio-blk
-drive file=$DSK0,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,discard=unmap,aio=native
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=-1,serial=xru001i,discard_granularity=32M \
and virtio-scsi
-drive file=$DSK0,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=none,aio=native,id=drive-vioscsi0
-device virtio-scsi-pci,id=scsi-vioscsi0
-device scsi-hd,drive=drive-vioscsi0,id=vioscsi0,bus=scsi-vioscsi0.0,lun=0,scsi-id=0,bootindex=-1,discard_granularity=32M \
I tried with discard_granularity=33554432
with qemu version 7.0.0 and it was still super slow and slowly consuming a lot of memory on the first try, but when I retried it worked flawlessly fast.
Not sure what could be happening, but at least works much better, thanks.
For reference here the config I used:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
<!-- [...] -->
<qemu:override>
<qemu:device alias="scsi0-0-0-0">
<qemu:frontend>
<qemu:property name="discard_granularity" type="unsigned" value="33554432"/>
</qemu:frontend>
</qemu:device>
</qemu:override>
</domain>
Or for old libvirt < 8.2:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
<!-- [...] -->
<qemu:commandline>
<qemu:arg value="-set"/>
<qemu:arg value="device.scsi0-0-0-0.discard_granularity=32M"/>
</qemu:commandline>
</domain>
Great work, thanks! On Libvirtd: 6.0.0 I had to use use:
<qemu:commandline>
<qemu:arg value='-set'/>
<qemu:arg value='device.scsi0-0-0-0.discard_granularity=33554432'/>
</qemu:commandline>
Beside huge improvement doing retrim in guest, it might give some overall performance improvement too (does it make sense?)
BEFORE discard_granularity=32MB
AFTER discard_granularity=32MB
(qcows2 on NVMe 3.0, virtio-scsi, no pre-allocation, iomode=native, cache=none)
@MrM40 I've seen a couple times that failure or malfunction in the viostor trim support handler can lead to some performance degradation with a real SSD disk attached. Newer tried to trace this issue down, but I will ask QE to run some performance tests on it.
Best, Vadim.
I tried both approaches (SSD emulation and 32M discard_granularity), and they seem to perform similarly in terms of optimization. Here's a screenshot of "defrag" output for the same drive (unchanged content), first as an emulated SSD (virtio-scsi) and second as a "Thin Provisioned LUN" (virtio-blk).
Couple things I noticed:
- The difference in "Backed allocations" is pretty substantial, which I guess means SSD emulation uses an even bigger discard granularity than 32MB.
- As noted in Optimize-Volume docs, SSDs are only re-trimmed, whereas Thin LUNs have slabs consolidated and are re-trimmed.
I have some questions around this if anyone can shed some light, and please do correct me if I misunderstand.
First, from what I've read about thin provisioning, it seems like it's better suited to dynamic disks where the backing file size can be shrunk (which I tried and it doesn't seem to work). I'm using non-sparse qcow2 and raw LVMs, so it seems like SSD emulation is actually more appropriate (as a default), hence the no-slab-consolidation? Is there a technical reason why virtio drives are all Thin Provisioned LUNs in Windows by default? Should I be leaving my drives as Thin LUNs instead of SSD emulation, which is what I've decided on? If NTFS max fragmentation were a concern, I would think optimization would always consolidate slabs.
Second, can someone suggest a way to confirm that trim is actually being passed to the drive in the SSD emulation case? Since it's a non-default configuration, I just want to make sure it's actually working as I have a lot of space (in LVMs) allocated to disks in the Windows VM.
Also, if you want to read about the ugly Windows bug you'll encounter if you don't use one of these workarounds, or disable the optimize schedule in Windows entirely, I've summarized below.
Thanks!
The Windows Bug
I found this problem by accident because I was monitoring memory performance counters and noticed that the memory usage on the machine kept spiking for no apparent reason. What I found was that the Disk Defragmenter service was running amok, continually retrying previously canceled Optimize/Trim operations that were taking longer than the computer's idle cycles. You could watch it happen in the "Defragment and Optimize Disks" GUI.
This is a screenshot of what I saw.
By default, Windows optimizes disks on a weekly schedule. What I saw in the GUI was that optimization would start for a disk, then:
- Analyzing Allocations would take many minutes
- Trimming would take many more minutes, depending on the size of the drive
- It would often say "Canceling...", presumably because the machine wasn't actually idle
- It would start up again a few minutes later
- There were associated error events in Event Viewer logs, hence the retries
I did some Googling and found this bug.
The link provides a hotfix for Windows 8 and Server 2012, but since the behaviour is identical, I'm going to guess the fix never made into any of the OSs proper. I will report this to MS.
The bug suggests that this problem only happens if the "slab size" is < 8MB and also suggests you don't need to optimize drives with a small slab size because space is managed more efficiently and optimization has diminishing returns.
I'm wondering if this was written before UNMAP support, but regardless I'd rather not have to mess with the default optimize settings/schedule.
I tried with
discard_granularity=33554432
with qemu version 7.0.0 and it was still super slow and slowly consuming a lot of memory on the first try, but when I retried it worked flawlessly fast. Not sure what could be happening, but at least works much better, thanks.For reference here the config I used:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm"> <!-- [...] --> <qemu:override> <qemu:device alias="scsi0-0-0-0"> <qemu:frontend> <qemu:property name="discard_granularity" type="unsigned" value="33554432"/> </qemu:frontend> </qemu:device> </qemu:override> </domain>
Or for old libvirt < 8.2:
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm"> <!-- [...] --> <qemu:commandline> <qemu:arg value="-set"/> <qemu:arg value="device.scsi0-0-0-0.discard_granularity=32M"/> </qemu:commandline> </domain>
One year has passed, and QEMU version is 7.2.4 now. I tested two drives, and both completed quickly on the first try. The difference might be that this value is being set for the first time.
Namespace in the xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0"
One year has passed, and QEMU version is 7.2.4 now. I tested two drives, and both completed quickly on the first try. The difference might be that this value is being set for the first time.
Namespace in the tag is a must
xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0"
Are you saying it work out-of-the-box (default settings) with 7.2.4, or do you still have to set this manually ?
One year has passed, and QEMU version is 7.2.4 now. I tested two drives, and both completed quickly on the first try. The difference might be that this value is being set for the first time. Namespace in the tag is a must
xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0"
Are you saying it work out-of-the-box (default settings) with 7.2.4, or do you still have to set this manually ?
In my quoted case, you need to run it twice to get it to work as expected, but in my case, it works on the first run. In both cases, setting the value first is needed.