Converting qcow2 images to raw is too slow
Description
Based on the logs, converting ubuntu server cloud image (xxx MiB) to raw format takes 17 seconds. The same operation using qemu-img convert takes 1.8 seconds.
Example log:
time="2024-09-02T01:43:25+03:00" level=info msg="Converting \"/Users/nsoffer/.lima/cluster/basedisk\" (qcow2) to a raw disk \"/Users/nsoffer/.lima/cluster/diffdisk\""
...
time="2024-09-02T01:43:42+03:00" level=info msg="Expanding to 20GiB"
Same with qemu-img
% time qemu-img convert -f qcow2 -O raw ~/.lima/cluster/basedisk diffdisk
qemu-img convert -f qcow2 -O raw ~/.lima/cluster/basedisk diffdisk 2.37s user 1.90s system 241% cpu 1.768 total
Lima has nice progress bar during the slow convert, but qemu-img is fast enough so no progress bar is needed. It has also a progress bar option that can be used to extract progress if needed.
Fix:
- use
qemu-img convertif available - use
-pto show progress
Fix:
- use
qemu-img convertif available
It would be better to fix the speed of the builtin conversion so it will be fast even when QEMU is not installed. Given that the default emulation in Lima 1.0 will be VZ, qemu will be an optional dependency.
I don't think that reinventing qemu-img good direction. The time spent on it can be spent on features that that add values to users. qemu-img is efficient, supports all images formats, well maintained, and available everywhere.
You can default to qemu-img (where available), and then fallback to the library as a slower fallback option?
We have used this trick elsewhere, like with SFTP or with XZ. The downside is having two code paths to test...
qemu-img is efficient, supports all images formats, well maintained, and available everywhere.
On macOS, it is hard to install qemu-img when Homebrew/MacPorts/nix is disallowed due to employers' policy
This may have a room for optimization https://github.com/lima-vm/go-qcow2reader/blob/v0.1.2/image/qcow2/qcow2.go#L795-L800
I just found out that the built-in conversion needs more diskspace than qemu-img convert. While the end-result is still a sparse disk, it seems to require the full 100GB disk space temporarily, so you cannot convert from QCOW2 to RAW on a device with limited free space.
$ df -h ~/.lima3
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk5 50Gi 692Mi 49Gi 2% 11 4.3G 0% /Users/jan/.lima3
$ l start --vm-type vz
? Creating an instance "default" Proceed with the current configuration
INFO[0001] Starting the instance "default" with VM driver "vz"
…
INFO[0002] Converting "/Users/jan/.lima3/default/basedisk" (qcow2) to a raw disk "/Users/jan/.lima3/default/diffdisk"
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 206.87 MiB/s
INFO[0019] Expanding to 100GiB
FATA[0020] failed to convert "/Users/jan/.lima3/default/basedisk" to a raw disk "/Users/jan/.lima3/default/diffdisk": no space left on device
Using qemu-img convert seems to require little extra space beyond what the new sparse file actually occupies.
While the end-result is still a sparse disk
Actually, it is not, with the builtin conversion. It turns into a fully allocated disk. So this is even worse. That also might explain why it takes so long: it possibly writes the full 100GB to disk.
The non-sparse issue is being fixed in:
- https://github.com/lima-vm/lima/pull/2715
I think the simplest way to fix it is to convert the image to raw after the download. There is no reason to keep qcow2 files in the cache when we use the file as a base disk, even when using qemu.
We can try to optimize qcow2 convert later to make the initial download faster.
New flow:
- download the image in whatever format (raw, qcow2, raw compressed)
- verify the checksum
- convert to uncompressed raw file
When creating a vm we can always do fast copy on the raw image from the cache.
Questions:
- do we use the stored checksum of the qcow2 image after the download?
- do we need a checksum of the raw file?
Issues:
- will not help the case when user create qcow2 disk and try to attach them to vz based instance
Testing shows that this makes limactl start almost 3 times faster:
Starting from qcow2 image
% cat test-qcow2.yaml
images:
- location: "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img"
arch: "aarch64"
vmType: vz
plain: true
% time limactl start --tty=false test-qcow2.yaml
INFO[0000] Terminal is not available, proceeding without opening an editor
INFO[0000] Starting the instance "test-qcow2" with VM driver "vz"
INFO[0000] Attempting to download the image arch=aarch64 digest= location="https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img"
INFO[0000] Using cache "/Users/nsoffer/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data"
INFO[0000] Converting "/Users/nsoffer/.lima/test-qcow2/basedisk" (qcow2) to a raw disk "/Users/nsoffer/.lima/test-qcow2/diffdisk"
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 201.56 MiB/s
INFO[0018] Expanding to 100GiB
WARN[0018] [hostagent] GRPC port forwarding is experimental
INFO[0018] [hostagent] hostagent socket created at /Users/nsoffer/.lima/test-qcow2/ha.sock
INFO[0018] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/nsoffer/.lima/test-qcow2/serial*.log")
INFO[0018] [hostagent] new connection from to
INFO[0019] SSH Local Port: 59529
INFO[0018] [hostagent] [VZ] - vm state change: running
INFO[0018] [hostagent] Running in plain mode. Mounts, port forwarding, containerd, etc. will be ignored. Guest agent will not be running.
INFO[0018] [hostagent] Waiting for the essential requirement 1 of 1: "ssh"
INFO[0028] [hostagent] Waiting for the essential requirement 1 of 1: "ssh"
INFO[0028] [hostagent] The essential requirement 1 of 1 is satisfied
INFO[0028] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished"
INFO[0028] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0029] READY. Run `ssh -F "/Users/nsoffer/.lima/test-qcow2/ssh.config" lima-test-qcow2` to open the shell.
limactl start --tty=false test-qcow2.yaml 19.99s user 1.53s system 71% cpu 29.911 total
Starting from raw image
% cat test-raw.yaml
images:
- location: "/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img"
arch: "aarch64"
vmType: vz
plain: true
% time limactl start --tty=false test-raw.yaml
INFO[0000] Terminal is not available, proceeding without opening an editor
INFO[0000] Starting the instance "test-raw" with VM driver "vz"
INFO[0000] Attempting to download the image arch=aarch64 digest= location=/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img
INFO[0000] Downloaded the image from "/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img"
INFO[0000] Converting "/Users/nsoffer/.lima/test-raw/basedisk" (raw) to a raw disk "/Users/nsoffer/.lima/test-raw/diffdisk"
INFO[0000] Expanding to 100GiB
WARN[0000] [hostagent] GRPC port forwarding is experimental
INFO[0000] [hostagent] hostagent socket created at /Users/nsoffer/.lima/test-raw/ha.sock
INFO[0000] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/nsoffer/.lima/test-raw/serial*.log")
INFO[0000] [hostagent] new connection from to
INFO[0000] SSH Local Port: 59539
INFO[0000] [hostagent] [VZ] - vm state change: running
INFO[0000] [hostagent] Running in plain mode. Mounts, port forwarding, containerd, etc. will be ignored. Guest agent will not be running.
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 1: "ssh"
INFO[0010] [hostagent] Waiting for the essential requirement 1 of 1: "ssh"
INFO[0010] [hostagent] The essential requirement 1 of 1 is satisfied
INFO[0010] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished"
INFO[0010] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0011] READY. Run `ssh -F "/Users/nsoffer/.lima/test-raw/ssh.config" lima-test-raw` to open the shell.
limactl start --tty=false test-raw.yaml 0.03s user 0.08s system 0% cpu 11.371 total
Converting the compressed qcow2 is 1.6 times faster with https://github.com/lima-vm/go-qcow2reader/pull/31 but matching qemu-img requires much more work.
Converting once at the end of the download is better, but with improve go-qcow2reader this save only 2 seconds for the default image, so it is lower priority. I'll open a new issue for this to consider in future version.