containerized-data-importer
containerized-data-importer copied to clipboard
Provisioned windows 10 image fails to boot after import
What happened: A clear and concise description of what the bug is.
We are deploying a VM that imports a raw disk image through a container image. This works on v1.59.0 for CDI and we can spin up this Windows 10 image; however, on v1.60.3, the import passes but the Windows 10 VM disk is not bootable.
I0404 13:20:25.728035 1 importer.go:107] Starting importer
I0404 13:20:25.728096 1 importer.go:182] begin import process
I0404 13:20:25.730691 1 data-processor.go:348] Calculating available size
I0404 13:20:25.730772 1 data-processor.go:360] Checking out file system volume size.
I0404 13:20:25.730821 1 data-processor.go:368] Request image size not empty.
I0404 13:20:25.730860 1 data-processor.go:373] Target size 116299313152.
I0404 13:20:25.731625 1 nbdkit.go:348] Waiting for nbdkit PID.
I0404 13:20:26.231853 1 nbdkit.go:369] nbdkit ready.
I0404 13:20:26.231906 1 data-processor.go:247] New phase: Convert
I0404 13:20:26.231933 1 data-processor.go:253] Validating image
I0404 13:20:26.240330 1 nbdkit.go:332] Log line from nbdkit: nbdkit: curl[1]: error: readahead: warning: underlying plugin does not support NBD_CMD_CACHE or PARALLEL thread model, so the filter won't do anything
E0404 13:20:26.241531 1 prlimit.go:156] failed to kill the process; os: process already finished
I0404 13:20:26.241704 1 qemu.go:115] Running qemu-img with args: [convert -t writeback -p -O raw nbd+unix:///?socket=/tmp/nbdkit.sock /data/disk.img]
I0404 13:20:26.247076 1 qemu.go:273] 0.00
I0404 13:20:26.248650 1 nbdkit.go:332] Log line from nbdkit: nbdkit: curl[2]: error: readahead: warning: underlying plugin does not support NBD_CMD_CACHE or PARALLEL thread model, so the filter won't do anything
I0404 13:20:26.929299 1 qemu.go:273] 1.01
...
I0404 13:21:49.718982 1 qemu.go:273] 99.66
E0404 13:21:50.108254 1 prlimit.go:156] failed to kill the process; os: process already finished
I0404 13:21:50.108303 1 data-processor.go:247] New phase: Resize
E0404 13:21:50.116014 1 prlimit.go:156] failed to kill the process; os: process already finished
W0404 13:21:50.116104 1 data-processor.go:330] Available space less than requested size, resizing image to available space 109902299136.
I0404 13:21:50.116116 1 data-processor.go:341] Expanding image size to: 109902299136
E0404 13:21:50.124445 1 prlimit.go:156] failed to kill the process; os: process already finished
I0404 13:21:50.124481 1 data-processor.go:253] Validating image
E0404 13:21:50.131163 1 prlimit.go:156] failed to kill the process; os: process already finished
I0404 13:21:50.131274 1 data-processor.go:247] New phase: Complete
I0404 13:21:50.132904 1 importer.go:231] {"scratchSpaceRequired":false,"preallocationApplied":false,"message":"Import Complete"}
What you expected to happen: A clear and concise description of what you expected to happen.
Windows 10 VM should be bootable.
How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior.
Deploy a provisioned windows 10 image (.raw import) and build the container image using https://github.com/kubevirt/containerized-data-importer/blob/main/doc/image-from-registry.md.
Additional context: Add any other context about the problem here.
If we run the following to import the image, the windows 10 VM starts up perfectly fine.
rm /mnt/k8s-local/pvc-74533623-3c78-4c43-ba56-d0224affab11/disk.img
find /data | grep w10
/data/bm/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/827/fs/disk/w10.img.gz
sudo gzip -dc /data/bm/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/827/fs/disk/w10.img.gz > /mnt/k8s-local/pvc-74533623-3c78-4c43-ba56-d0224affab11/disk.img
chmod 660 /mnt/k8s-local/pvc-74533623-3c78-4c43-ba56-d0224affab11/disk.img
chown 107 /mnt/k8s-local/pvc-74533623-3c78-4c43-ba56-d0224affab11/disk.img
Environment:
- CDI version (use
kubectl get deployments cdi-deployment -o yaml):1.60.3 - Kubernetes version (use
kubectl version):v1.31.6 - DV specification: N/A
- Cloud provider or hardware configuration: N/A
- OS (e.g. from /etc/os-release): Ubuntu 24.04.2 LTS
- Kernel (e.g.
uname -a):6.8.0-55-generic - Install tools: N/A
- Others: N/A
Could https://github.com/kubevirt/containerized-data-importer/issues/3457 be related?
I've just reformatted the same image to qcow2 and imported the image again and its booting up fine.
So essentially the disk.img that CDI puts in your PVC is empty or corrupted? Could you also post the VM (virt-launcher-* pod) logs? The gzip scenario you're describing should be identical to what we're doing internally
@bc185174 One way we could try to identify the exact change that broke this flow for you is to do a git bisect between the good and bad versions that you identified. Since you are the only one who has the offending image, would you be willing to try that and let us know what you find?
Hi, sorry for the late reply.
So essentially the disk.img that CDI puts in your PVC is empty or corrupted?
Correct, the image is not bootable. If we manually decompress it using gzip and copy it to the PV, then it boots fine.
One way we could try to identify the exact change that broke this flow for you is to do a git bisect between the good and bad versions that you identified. Since you are the only one who has the offending image, would you be willing to try that and let us know what you find?
Yeah sure thing. So v1.59.0 works fine but 1.60.3 breaks. What would you like me to try?
What would you like me to try?
Maybe just a simple md5 check to start things off (I expect 1.60.3 != 1.59.0)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close
@kubevirt-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.