lima icon indicating copy to clipboard operation
lima copied to clipboard

Converting qcow2 images to raw is too slow

Open nirs opened this issue 1 year ago • 10 comments

Description

Based on the logs, converting ubuntu server cloud image (xxx MiB) to raw format takes 17 seconds. The same operation using qemu-img convert takes 1.8 seconds.

Example log:

time="2024-09-02T01:43:25+03:00" level=info msg="Converting \"/Users/nsoffer/.lima/cluster/basedisk\" (qcow2) to a raw disk \"/Users/nsoffer/.lima/cluster/diffdisk\""
...
time="2024-09-02T01:43:42+03:00" level=info msg="Expanding to 20GiB"

Same with qemu-img

% time qemu-img convert -f qcow2 -O raw ~/.lima/cluster/basedisk diffdisk
qemu-img convert -f qcow2 -O raw ~/.lima/cluster/basedisk diffdisk  2.37s user 1.90s system 241% cpu 1.768 total

Lima has nice progress bar during the slow convert, but qemu-img is fast enough so no progress bar is needed. It has also a progress bar option that can be used to extract progress if needed.

Fix:

  • use qemu-img convert if available
  • use -p to show progress

nirs avatar Sep 01 '24 23:09 nirs

Fix:

  • use qemu-img convert if available

It would be better to fix the speed of the builtin conversion so it will be fast even when QEMU is not installed. Given that the default emulation in Lima 1.0 will be VZ, qemu will be an optional dependency.

jandubois avatar Sep 02 '24 00:09 jandubois

I don't think that reinventing qemu-img good direction. The time spent on it can be spent on features that that add values to users. qemu-img is efficient, supports all images formats, well maintained, and available everywhere.

nirs avatar Sep 02 '24 00:09 nirs

You can default to qemu-img (where available), and then fallback to the library as a slower fallback option?

We have used this trick elsewhere, like with SFTP or with XZ. The downside is having two code paths to test...

afbjorklund avatar Sep 02 '24 05:09 afbjorklund

qemu-img is efficient, supports all images formats, well maintained, and available everywhere.

On macOS, it is hard to install qemu-img when Homebrew/MacPorts/nix is disallowed due to employers' policy

AkihiroSuda avatar Sep 02 '24 06:09 AkihiroSuda

This may have a room for optimization https://github.com/lima-vm/go-qcow2reader/blob/v0.1.2/image/qcow2/qcow2.go#L795-L800

AkihiroSuda avatar Sep 02 '24 06:09 AkihiroSuda

I just found out that the built-in conversion needs more diskspace than qemu-img convert. While the end-result is still a sparse disk, it seems to require the full 100GB disk space temporarily, so you cannot convert from QCOW2 to RAW on a device with limited free space.

$ df -h ~/.lima3
Filesystem    Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk5    50Gi   692Mi    49Gi     2%      11  4.3G    0%   /Users/jan/.lima3

$ l start --vm-type vz
? Creating an instance "default" Proceed with the current configuration
INFO[0001] Starting the instance "default" with VM driver "vz"
…
INFO[0002] Converting "/Users/jan/.lima3/default/basedisk" (qcow2) to a raw disk "/Users/jan/.lima3/default/diffdisk"
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 206.87 MiB/s
INFO[0019] Expanding to 100GiB
FATA[0020] failed to convert "/Users/jan/.lima3/default/basedisk" to a raw disk "/Users/jan/.lima3/default/diffdisk": no space left on device

Using qemu-img convert seems to require little extra space beyond what the new sparse file actually occupies.

jandubois avatar Oct 09 '24 22:10 jandubois

While the end-result is still a sparse disk

Actually, it is not, with the builtin conversion. It turns into a fully allocated disk. So this is even worse. That also might explain why it takes so long: it possibly writes the full 100GB to disk.

jandubois avatar Oct 09 '24 23:10 jandubois

The non-sparse issue is being fixed in:

  • https://github.com/lima-vm/lima/pull/2715

AkihiroSuda avatar Oct 10 '24 00:10 AkihiroSuda

I think the simplest way to fix it is to convert the image to raw after the download. There is no reason to keep qcow2 files in the cache when we use the file as a base disk, even when using qemu.

We can try to optimize qcow2 convert later to make the initial download faster.

New flow:

  1. download the image in whatever format (raw, qcow2, raw compressed)
  2. verify the checksum
  3. convert to uncompressed raw file

When creating a vm we can always do fast copy on the raw image from the cache.

Questions:

  • do we use the stored checksum of the qcow2 image after the download?
  • do we need a checksum of the raw file?

Issues:

  • will not help the case when user create qcow2 disk and try to attach them to vz based instance

Testing shows that this makes limactl start almost 3 times faster:

Starting from qcow2 image

% cat test-qcow2.yaml 
images:
- location: "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img"
  arch: "aarch64"
vmType: vz
plain: true

% time limactl start --tty=false test-qcow2.yaml
INFO[0000] Terminal is not available, proceeding without opening an editor 
INFO[0000] Starting the instance "test-qcow2" with VM driver "vz" 
INFO[0000] Attempting to download the image              arch=aarch64 digest= location="https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img"
INFO[0000] Using cache "/Users/nsoffer/Library/Caches/lima/download/by-url-sha256/002fbe468673695a2206b26723b1a077a71629001a5b94efd8ea1580e1c3dd06/data" 
INFO[0000] Converting "/Users/nsoffer/.lima/test-qcow2/basedisk" (qcow2) to a raw disk "/Users/nsoffer/.lima/test-qcow2/diffdisk" 
3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 201.56 MiB/s
INFO[0018] Expanding to 100GiB                          
WARN[0018] [hostagent] GRPC port forwarding is experimental 
INFO[0018] [hostagent] hostagent socket created at /Users/nsoffer/.lima/test-qcow2/ha.sock 
INFO[0018] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/nsoffer/.lima/test-qcow2/serial*.log") 
INFO[0018] [hostagent] new connection from  to          
INFO[0019] SSH Local Port: 59529                        
INFO[0018] [hostagent] [VZ] - vm state change: running  
INFO[0018] [hostagent] Running in plain mode. Mounts, port forwarding, containerd, etc. will be ignored. Guest agent will not be running. 
INFO[0018] [hostagent] Waiting for the essential requirement 1 of 1: "ssh" 
INFO[0028] [hostagent] Waiting for the essential requirement 1 of 1: "ssh" 
INFO[0028] [hostagent] The essential requirement 1 of 1 is satisfied 
INFO[0028] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished" 
INFO[0028] [hostagent] The final requirement 1 of 1 is satisfied 
INFO[0029] READY. Run `ssh -F "/Users/nsoffer/.lima/test-qcow2/ssh.config" lima-test-qcow2` to open the shell.
limactl start --tty=false test-qcow2.yaml  19.99s user 1.53s system 71% cpu 29.911 total

Starting from raw image

% cat test-raw.yaml  
images:
- location: "/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img"
  arch: "aarch64"
vmType: vz
plain: true

% time limactl start --tty=false test-raw.yaml  
INFO[0000] Terminal is not available, proceeding without opening an editor 
INFO[0000] Starting the instance "test-raw" with VM driver "vz" 
INFO[0000] Attempting to download the image              arch=aarch64 digest= location=/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img
INFO[0000] Downloaded the image from "/Users/nsoffer/vms/ubuntu-24.04-server-cloudimg-arm64.img" 
INFO[0000] Converting "/Users/nsoffer/.lima/test-raw/basedisk" (raw) to a raw disk "/Users/nsoffer/.lima/test-raw/diffdisk" 
INFO[0000] Expanding to 100GiB                          
WARN[0000] [hostagent] GRPC port forwarding is experimental 
INFO[0000] [hostagent] hostagent socket created at /Users/nsoffer/.lima/test-raw/ha.sock 
INFO[0000] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/nsoffer/.lima/test-raw/serial*.log") 
INFO[0000] [hostagent] new connection from  to          
INFO[0000] SSH Local Port: 59539                        
INFO[0000] [hostagent] [VZ] - vm state change: running  
INFO[0000] [hostagent] Running in plain mode. Mounts, port forwarding, containerd, etc. will be ignored. Guest agent will not be running. 
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 1: "ssh" 
INFO[0010] [hostagent] Waiting for the essential requirement 1 of 1: "ssh" 
INFO[0010] [hostagent] The essential requirement 1 of 1 is satisfied 
INFO[0010] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished" 
INFO[0010] [hostagent] The final requirement 1 of 1 is satisfied 
INFO[0011] READY. Run `ssh -F "/Users/nsoffer/.lima/test-raw/ssh.config" lima-test-raw` to open the shell. 
limactl start --tty=false test-raw.yaml  0.03s user 0.08s system 0% cpu 11.371 total

nirs avatar Oct 13 '24 17:10 nirs

Converting the compressed qcow2 is 1.6 times faster with https://github.com/lima-vm/go-qcow2reader/pull/31 but matching qemu-img requires much more work.

nirs avatar Oct 15 '24 18:10 nirs

Converting once at the end of the download is better, but with improve go-qcow2reader this save only 2 seconds for the default image, so it is lower priority. I'll open a new issue for this to consider in future version.

nirs avatar Oct 25 '24 08:10 nirs