vz icon indicating copy to clipboard operation
vz copied to clipboard

I/O performance in a virtual machine

Open jingshanccc opened this issue 2 months ago • 7 comments

Describe the bug I am testing the disk I/O performance in a virtual machine and found that both read and write performance have significantly degraded (by several times) compared to the host machine. I would like to know if this is expected and if there are any optimization methods available. Thx~ 我正在测试虚拟机中的磁盘IO性能,发现相对于母机来说,读写性能都变差了许多(数倍),我想问是否符合预期,同时是否有优化的方式。非常感谢

To Reproduce

  1. Use vz.NewVirtualMachine to create a virtual machine with specified configurations, and use vz.NewMacOSInstaller to install the virtual machine from an official IPSW file.
  2. For the disk, use vz.NewVirtioBlockDeviceConfiguration and create an IMG file as the disk via vz.CreateDiskImage.
  3. 使用vz.NewVirtualMachine创建指定配置的虚拟机,通过vz.NewMacOSInstaller,从官方的ipsw文件安装得到虚拟机
  4. 磁盘使用的是vz.NewVirtioBlockDeviceConfiguration,通过vz.CreateDiskImage创建img文件作为磁盘

Screenshots Disk I/O performance of the local machine (MacBookPro M4) / 本机(MacBookPro M4)的磁盘IO数据

Image

Virtual machine disk I/O performance / 虚拟机的磁盘IO数据

Image

Environment that you use to compile (please complete the following information):

  • Xcode version: Xcode 16.4
  • macOS Version: macOS15.6
  • mac architecture: arm
  • Go Version: 1.24.4

Additional context Add any other context about the problem here.

jingshanccc avatar Oct 28 '25 02:10 jingshanccc

You could experiment with the caching/syncing options:

attachment, err := vz.NewDiskImageStorageDeviceAttachmentWithCacheAndSync(diskPath, false, vz.DiskImageCachingModeAutomatic, vz.DiskImageSynchronizationModeFsync)

cfergeau avatar Oct 28 '25 10:10 cfergeau

You could experiment with the caching/syncing options:

attachment, err := vz.NewDiskImageStorageDeviceAttachmentWithCacheAndSync(diskPath, false, vz.DiskImageCachingModeAutomatic, vz.DiskImageSynchronizationModeFsync)

After adjusting parameters related to cache and sync, no significant improvement was observed. Specifically, the performance loss of concurrent 4K random read/write operations (RND4KQD64) exceeds 50% compared to the host machine. 尝试了调整cache和sync不同的参数,没有明显变化,特别是4K小文件并发随机读写RND4KQD64的性能损耗,对比宿主机在50%以上。

Used the following two configurations:

  • vz.DiskImageCachingModeAutomatic, vz.DiskImageSynchronizationModeFsync:essentially identical. Is this configuration the default one, i.e., the configuration of vz.NewDiskImageStorageDeviceAttachment?

  • vz.DiskImageCachingModeUncached, vz.DiskImageSynchronizationModeNone:essentially identical.

使用了以下两种配置

  • vz.DiskImageCachingModeAutomatic, vz.DiskImageSynchronizationModeFsync:数据基本相同,这个配置是否是默认配置,即vz.NewDiskImageStorageDeviceAttachment的配置
  • vz.DiskImageCachingModeUncached, vz.DiskImageSynchronizationModeNone:数据基本相同

Do you have any other ideas? Thx!!! 还有其他的思路吗?谢谢~

jingshanccc avatar Oct 30 '25 08:10 jingshanccc

I never looked at IO performance, I only know these options exist. While at it, you could also test vz.DiskImageCachingModeCached which is the only one you did not test. I expect it would perform better than Uncached

cfergeau avatar Oct 30 '25 09:10 cfergeau

I never looked at IO performance, I only know these options exist. While at it, you could also test vz.DiskImageCachingModeCached which is the only one you did not test. I expect it would perform better than Uncached

After trying various combinations of cache and sync settings, the random read/write performance in cache mode was closer to that of the host machine. However, it's worth mentioning that noticeable differences occurred when I adjusted the number of CPU cores allocated to the virtual machine. My host machine has a 28-core CPU and 96GB of RAM. When the VM was configured with 4 cores, its random read/write performance was on par with the host. But when the number of cores allocated to the VM was increased to between 12 and 24 cores, the random read/write performance dropped to half of the host's performance.

尝试了cache和sync的多种参数搭配,cache模式随机读写性能上表现更接近宿主机。不过值得一提的是当我调整虚拟机的cpu核数配置时,有了明显变化。我的宿主机配置是28核96G,当虚拟机配置4核时,随机读写性能和宿主机持平;当虚拟机核数提升至12c~24c之后,随机读写性能降低到宿主机的一半。

  • host 28c96g
Image
  • vm 4c4g automatic
Image
  • vm 4c4g cache
Image
  • vm 12c 4g cache
Image
  • vm 24c 90g cache
Image

jingshanccc avatar Oct 30 '25 13:10 jingshanccc

@jingshanccc It is a fairly "known" thing within the macOS (Apple Silicon) virtualization scene that (probably due to the NUMA memory layout) the Ultra line doesn't really perform well when it comes to VM performance. I see you are measuring on Apple M3 Ultra. I have the same experience: you can squeeze out the best performance by keeping the number of assigned cores low.

Actually none of the macOS CI providers I know of use Ultra Macs for running VMs. I suggest you to stick to the Pro machines instead.

nagypeterjob avatar Dec 06 '25 23:12 nagypeterjob

Anecdotally, you will get the best performance with VZDiskImageCachingMode:cached + VZDiskImageSynchronizationMode:none.

Also, with Tahoe (macOS 26), Apple introduced the ASIF image format which promises better Disk IO than the good old Disk Image format. VZ support is already merged to master, but not released yet. You can find the commit here: https://github.com/Code-Hex/vz/commit/e669237d0ee813976a729143769c1ed35d05ae05

Your host also needs to be Tahoe to be able to use ASIF. I hope it was useful, let me know.

nagypeterjob avatar Dec 06 '25 23:12 nagypeterjob

@jingshanccc It is a fairly "known" thing within the macOS (Apple Silicon) virtualization scene that (probably due to the NUMA memory layout) the Ultra line doesn't really perform well when it comes to VM performance. I see you are measuring on Apple M3 Ultra. I have the same experience: you can squeeze out the best performance by keeping the number of assigned cores low.

Actually none of the macOS CI providers I know of use Ultra Macs for running VMs. I suggest you to stick to the Pro machines instead.

Thank you so much! We ultimately suspect that Ultra's chip architecture is causing performance issues. Unfortunately, the pro model does not have the high configuration we need. In the end, I also chose the cache mode, which has a certain degree of optimization effect. I also made preliminary attempts on taohe and asif on ultra, but they did not perform well, but they were not fully tested and I was busy with other work. I will find time to continue to verify them later. 非常感谢!我们最终也怀疑Ultra的芯片架构导致性能表现问题。遗憾的是pro机型没有我们需要的高配置,最终我也选择了cache模式,有一定程度的优化效果。 taohe和asif我当时在ultra上也有初步的尝试,没有较好的表现,但没有全面测试,又忙于其他工作,后续我会找时间继续验证。

jingshanccc avatar Dec 09 '25 03:12 jingshanccc