VM is not initializing on ARM64
What happened:
I am unable to access a newly deployed VM. The output of kubectl get vmi shows that the VM is running and ready but I believe it is not fully initializing as I am unable to access the VM via virtctl console / virtctl ssh and there are no guest console logs from the virt-launcher pod. As a note, I deployed the Kubernetes cluster using K0s.
All nodes in the cluster are passing qemu validation:
node3:~$ virt-host-validate qemu
QEMU: Checking if device /dev/kvm exists : PASS
QEMU: Checking if device /dev/kvm is accessible : PASS
QEMU: Checking if device /dev/vhost-net exists : PASS
QEMU: Checking if device /dev/net/tun exists : PASS
QEMU: Checking for cgroup 'cpu' controller support : PASS
QEMU: Checking for cgroup 'cpuacct' controller support : PASS
QEMU: Checking for cgroup 'cpuset' controller support : PASS
QEMU: Checking for cgroup 'memory' controller support : PASS
QEMU: Checking for cgroup 'devices' controller support : PASS
QEMU: Checking for cgroup 'blkio' controller support : PASS
QEMU: Checking for device assignment IOMMU support : WARN (Unknown if this platform has IOMMU support)
QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support)
Kubevirt components:
$ kubectl get all -n kubevirt
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/virt-api-64d75d4f5-66vxg 1/1 Running 0 22h
pod/virt-api-64d75d4f5-rl6cn 1/1 Running 0 22h
pod/virt-controller-64d65c6684-ggwlc 1/1 Running 0 22h
pod/virt-controller-64d65c6684-xqx7m 1/1 Running 0 22h
pod/virt-handler-82vdv 1/1 Running 0 22h
pod/virt-handler-fsvz8 1/1 Running 0 22h
pod/virt-handler-l664w 1/1 Running 0 22h
pod/virt-operator-6c89df8955-jrjf9 1/1 Running 0 22h
pod/virt-operator-6c89df8955-r9wkj 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubevirt-operator-webhook ClusterIP 10.101.225.75 <none> 443/TCP 22h
service/kubevirt-prometheus-metrics ClusterIP None <none> 443/TCP 22h
service/virt-api ClusterIP 10.96.236.192 <none> 443/TCP 22h
service/virt-exportproxy ClusterIP 10.110.33.182 <none> 443/TCP 22h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/virt-handler 3 3 3 3 3 kubernetes.io/os=linux 22h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/virt-api 2/2 2 2 22h
deployment.apps/virt-controller 2/2 2 2 22h
deployment.apps/virt-operator 2/2 2 2 22h
NAME DESIRED CURRENT READY AGE
replicaset.apps/virt-api-64d75d4f5 2 2 2 22h
replicaset.apps/virt-controller-64d65c6684 2 2 2 22h
replicaset.apps/virt-operator-6c89df8955 2 2 2 22h
NAME AGE PHASE
kubevirt.kubevirt.io/kubevirt 22h Deployed
############################
$ kubectl get pod,vm,vmi
NAME READY STATUS RESTARTS AGE
pod/virt-launcher-testvm-fhrc2 3/3 Running 0 11m
NAME AGE STATUS READY
virtualmachine.kubevirt.io/testvm 11m Running True
NAME AGE PHASE IP NODENAME READY
virtualmachineinstance.kubevirt.io/testvm 11m Running 10.244.135.7 node3 True
What you expected to happen: Deploy a working VM using kubevirt.
How to reproduce it (as minimally and precisely as possible):
- Deploy a K0s kubernetes cluster using
k0sctl(https://docs.k0sproject.io/v1.30.0+k0s.0/k0sctl-install/) on a Turing RK1 compute module. Note, I am using Calico with vxlan as my CNI but this did fail with the same issue using kube-router (default CNI with K0s) - Install Kubervirt
- Deploy a test VM following https://kubevirt.io/labs/kubernetes/lab1
Additional context:
My server is using ARM64 architecture and the hardware is Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image. I have attempted to use an aarch64 image for my kubevirt VM but that also failed to initialize (I used image quay.io/kubevirt/cirros-container-disk-demo:v1.2.2-arm64).
I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).
I have attached the logs from the virt-launcher pod (all containers) and my kubevirt CR object.
Environment:
- KubeVirt version (use
virtctl version): v1.2.1 - Kubernetes version (use
kubectl version): v1.30.0+k0s - VM or VMI specifications:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: testvm
spec:
running: false
template:
metadata:
labels:
kubevirt.io/size: small
kubevirt.io/domain: testvm
spec:
domain:
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
resources:
requests:
memory: 64M
networks:
- name: default
pod: {}
volumes:
- name: containerdisk
containerDisk:
image: quay.io/kubevirt/cirros-container-disk-demo
- name: cloudinitdisk
cloudInitNoCloud:
userDataBase64: SGkuXG4=
- Cloud provider or hardware configuration: Hardware is baremetal Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). The cluster is 4 nodes (1 controller and 3 workers) but I had this issue using one RK1 node as a single node cluster.
- OS (e.g. from /etc/os-release): Ubuntu 22.04.3
- Kernel (e.g.
uname -a): 5.10.160-rockchip aarch64 GNU/Linux - Install tools: K0s with Calico vxlan was used to deploy the Kubernetes cluster
/cc @zhlhahaha Is this something you are able to help with?
resources: requests: memory: 64M
Hi @jaredcash Can you try to increase the memory from 64M to 256M?
@zhlhahaha unfortunately, increasing the memory did not help. It still appears that the VM will not initialize as the guest-console-log container logs are blank and I am unable to access it:
resources:
requests:
memory: 256M
guest-console-log:
$ kubectl logs virt-launcher-testvm2-6rkq6 -c guest-console-log | wc -l
0
Note: I also tried creating the VM with 1G of memory and got the same results.
I attached the describe of both the virt-launcher pod and the new VM object with 256M in case that is helpful. Please let me know if any additional items are needed.
It is interesting that there are no failure logs in virt-launcher.log or the console log. This usually indicates that the VM may be encountering a boot failure during the bootloader stage. This could be caused by incorrect UEFI firmware, a corrupted VM disk, or a mismatch in the CPU architecture of the VM disk. I will investigate this further in my local environment.
The quay.io/kubevirt/cirros-container-disk-demo:latest is only for x86_64.
Can you use this image and make sure the allocated memory is equal or larger than 256M?
quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64
I can successfully boot the VM based on the image in my local env.
@aburdenthehand If @jaredcash verified the image issue, we can update the document https://kubevirt.io/labs/kubernetes/lab1.
I don't see any of the labs specifying the cirros disk image. Am I missing something? That said, I would be happy to have any callouts or alternative steps to support additional architectures in the labs, and really any and all improvement to the labs. Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.
The
quay.io/kubevirt/cirros-container-disk-demo:latestis only for x86_64. Can you use this image and make sure the allocated memory is equal or larger than 256M?quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64I can successfully boot the VM based on the image in my local env.
@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:
$ kubectl get vm
NAME AGE STATUS READY
testvm1 43m Running True
$ kubectl get vm testvm1 -o custom-columns=MEMORY:.spec.template.spec.domain.resources.requests.memory,IMAGE:.spec.template.spec.volumes[0].containerDisk.image
MEMORY IMAGE
1G quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64
$ kubectl logs virt-launcher-testvm1-vwwr8 -c guest-console-log | wc -l
0
I'm unsure if this is somehow related to my hardware, even though the virt-host-validate is seemingly passing for each node and I am able to deploy a VM using virt-install on the worker nodes.
I don't see any of the labs specifying the cirros disk image. Am I missing something?
It do not specific cirros disk image in the doc. However, the command in the vm configuration file vm.yaml contains the disk image information, and in which it would use x86 only cirros image.
wget https://kubevirt.io/labs/manifests/vm.yaml
Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.
Ok, I will rise an issue after I solve @jaredcash 's problem.
@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:
Would you mind to collect the following information?
- show if the qemu process is running
ps aux|grep qemu - Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log
$ kubectl edit kubevirt -n kubevirt
apiVersion: kubevirt.io/v1
kind: KubeVirt
...
spec:
...
configuration:
developerConfiguration:
logVerbosity:
virtLauncher: 8
...
status:
- Are you using
virtctl console testvmto visit the virtual machine? Can you access the vm console or you get an error message when run this command?
hello @zhlhahaha, here is the requested information:
show if the qemu process is running ps aux|grep qemu
Please view the following of the qemu processes on the worker node:
root@node3:~# ps aux | grep qemu
uuidd 1045139 0.0 0.1 1686836 11756 ? Ssl 20:40 0:00 /usr/bin/virt-launcher-monitor --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd 1045157 0.1 0.6 2561068 52644 ? Sl 20:40 0:01 /usr/bin/virt-launcher --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd 1045172 0.0 0.2 1290448 21328 ? Sl 20:40 0:00 /usr/sbin/virtqemud -f /var/run/libvirt/virtqemud.conf
uuidd 1045391 100 2.1 1718748 170312 ? Sl 20:40 23:57 /usr/libexec/qemu-kvm -name guest=default_testvm1,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/run/kubevirt-private/libvirt/qemu/lib/domain-1-default_testvm1/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/AAVMF/AAVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-private/libvirt/qemu/nvram/testvm1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine virt-rhel9.2.0,usb=off,gic-version=3,dump-guest-core=off,memory-backend=mach-virt.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,acpi=on -accel kvm -cpu host -m size=976896k -object {"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1000341504} -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object {"qom-type":"iothread","id":"iothread1"} -uuid cfb867c9-fa3a-51f5-b0f5-485fd556fd68 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=20,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"} -device {"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"} -device {"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"} -device {"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"} -device {"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"} -device {"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"} -device {"driver":"pcie-root-port","port":14,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x1.0x6"} -device {"driver":"pcie-root-port","port":15,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x1.0x7"} -device {"driver":"pcie-root-port","port":16,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"qemu-xhci","id":"usb","bus":"pci.5","addr":"0x0"} -device {"driver":"virtio-scsi-pci-non-transitional","id":"scsi0","bus":"pci.6","addr":"0x0"} -device {"driver":"virtio-serial-pci-non-transitional","id":"virtio-serial0","bus":"pci.7","addr":"0x0"} -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.8","addr":"0x0","drive":"libvirt-2-format","id":"ua-containerdisk","bootindex":1,"write-cache":"on","werror":"stop","rerror":"stop"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/testvm1/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.9","addr":"0x0","drive":"libvirt-1-format","id":"ua-cloudinitdisk","write-cache":"on","werror":"stop","rerror":"stop"} -netdev {"type":"tap","fd":"21","vhost":true,"vhostfd":"23","id":"hostua-default"} -device {"driver":"virtio-net-pci-non-transitional","host_mtu":1450,"netdev":"hostua-default","id":"ua-default","mac":"6e:7e:49:88:36:2d","bus":"pci.1","addr":"0x0","romfile":""} -add-fd set=0,fd=19,opaque=serial0-log -chardev socket,id=charserial0,fd=17,server=on,wait=off,logfile=/dev/fdset/0,logappend=on -serial chardev:charserial0 -chardev socket,id=charchannel0,fd=18,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -device {"driver":"usb-kbd","id":"input1","bus":"usb.0","port":"2"} -audiodev {"id":"audio1","driver":"none"} -vnc vnc=unix:/var/run/kubevirt-private/72b0120f-b4ad-4a1b-a612-3b37466eeebc/virt-vnc,audiodev=audio1 -device {"driver":"virtio-gpu-pci","id":"video0","max_outputs":1,"bus":"pci.2","addr":"0x0"} -device {"driver":"virtio-balloon-pci-non-transitional","id":"balloon0","free-page-reporting":true,"bus":"pci.10","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
root 1054126 0.0 0.0 6020 1984 pts/1 S+ 21:04 0:00 grep --color=auto qemu
Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log
I have attached all the container logs of the virt-launcher pod after adding more verbose logging.
Are you using virtctl console testvm to visit the virtual machine? Can you access the vm console or you get an error message when run this command?
I do have virtctl installed on the manager and I have attempted to console into the VM. I do not get an error message but I just get a blank display. Even if I hit any key, it will not progress any further until I ctrl + ] to escape, for reference:
$ virtctl console testvm1
Successfully connected to testvm1 console. The escape sequence is ^]
$
Please let me know if anything additional is needed.
There is no error from the virt-launcher.log and qemu process can start successfully. Let's try fedora image, would you mind to use following config to start fedora image?
---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
labels:
special: vmi-fedora
name: vmi-fedora
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- masquerade: {}
name: default
rng: {}
resources:
requests:
memory: 1024M
networks:
- name: default
pod: {}
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: quay.io/containerdisks/fedora:40
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
With the fedora image, the VM is still not initializing. Even though vmi is showing the VM as running, virtctl console is still blank. I am personally not noticing outliers in the logs but I have attached them for further inspection. Outputs for reference:
$ kubectl get pod,vmi
NAME READY STATUS RESTARTS AGE
pod/virt-launcher-testvm1-xxf76 3/3 Running 0 24h
pod/virt-launcher-vmi-fedora-rjtdr 3/3 Running 0 20m
NAME AGE PHASE IP NODENAME READY
virtualmachineinstance.kubevirt.io/testvm1 24h Running 10.244.135.15 node3 True
virtualmachineinstance.kubevirt.io/vmi-fedora 20m Running 10.244.104.13 node2 True
As a note, I also tested the fedora VM with 2048M and got the same results. The logs I provided are from the 1024M VM.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Unfortunately, this issue persists. @zhlhahaha and/or Kubevirt team, have you had some time to review my previous message? Please let me know if anything is needed to continue troubleshooting this issue.
I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).
Sorry, I missed your message. I suspect the UEFI boot failed. Would you mind to provide more information?
- CPU information and Memory infromation,
sudo dmidecode -t processor && free -h, as you have three node, check if all nodes have the same CPU. - the configuration of cirros vm started via
virsh dumpxml vmname
@zhlhahaha it seems there is an issue with dmidecode on aarch64 systems as I am getting the following error on my baremetal servers:
$ sudo dmidecode -t processor
# dmidecode 3.3
# No SMBIOS nor DMI entry point found, sorry.
I gather the CPU information for all my nodes via the lscpu command. Please let me know if the information is fine or if another command is suggested (e.g. lshw).
Additionally, I deployed a new Cirros VM using the image suggested here (https://github.com/kubevirt/kubevirt.github.io/issues/956#issuecomment-2255482142) and gather the dumpxml of this VM.
Note, that the VM is experiencing our issue, failing to initialize.
nodes-cpu-mem-info.txt
cirros-dumpxml.txt
Hi @jaredcash The cpu info is ok, I saw it use A72 and A55 Arm cpu, I need to check their spec. I used to start kubevirt on Raspberry Pi 4 which has Cortex-A72 CPU. In terms of the cirros vm configuration, I means the successfully boot cirros vm configuration via pure virsh, as you said:
I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image.
Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh. virsh-cirros-dumpxml.txt
Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh. virsh-cirros-dumpxml.txt
Thanks! I didn’t notice any differences between the successfully booted Cirros VM and the KubeVirt one. Would you mind double-checking if the successfully booted Cirros VM is starting on the server with the Cortex-A55 CPU? Initially, I suspected a difference in the Generic Interrupt Controller (GIC) versions between the Cortex-A55 and Cortex-A72 CPUs, but they appear to use the same GIC version. Now, it seems the UEFI firmware may be the only possible cause. Would you be able to replace /usr/share/AAVMF/AAVMF_CODE.fd in the virt-launcher with the one from the host?
@zhlhahaha I redeployed the cirros test VM I did with kubevirt to the same node (node4) to ensure it is using the same CPU (Cortex-A55).
Regarding replacing AAVMF_CODE.fd , it seems that the virt-launcher pod does not allow edits of this file as sudo is not available:
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- ls -l /usr/share/AAVMF/AAVMF_CODE.fd
lrwxrwxrwx 1 root root 42 Jan 1 1970 /usr/share/AAVMF/AAVMF_CODE.fd -> ../edk2/aarch64/QEMU_EFI-silent-pflash.raw
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- mv /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/AAVMF/AAVMF_CODE.fd.bak
mv: cannot move '/usr/share/AAVMF/AAVMF_CODE.fd' to '/usr/share/AAVMF/AAVMF_CODE.fd.bak': Permission denied
command terminated with exit code 1
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- sudo rm -f /usr/share/AAVMF/AAVMF_CODE.fd
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8189abc40bee955c058cc308065c5bbff4ba28ec0438f0c08fe369f6cc0aebb3": OCI runtime exec failed: exec failed: unable to start container process: exec: "sudo": executable file not found in $PATH: unknown
Note, I did attempt to become the root user but it is asking for a password:
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- /bin/bash
bash-5.1$ su - root
Password:
su: Authentication failure
bash-5.1$
As a workaround, I used nsenter from the host node to copy the file from the host node to the virt-launcher pod which worked:
[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 qemu qemu 67108864 Nov 2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root 35 Jan 1 1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root 40 Jan 1 1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.ra
After giving it some time, the VM was still not initializing. In an attempt to get it to work, I changed the ownership of AAVMF_CODE.fd to root:root
[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov 2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root 35 Jan 1 1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root 40 Jan 1 1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw
Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work.
I have re-copied the AAVMF_CODE.fd from the host node to the pod after the pod restart, so the current state is the following:
$ kubectl exec -it virt-launcher-testvm1-4hvwn -- ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov 2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root 35 Jan 1 1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root 40 Jan 1 1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw
I have also attached fresh logs of the virt-launcher pod for reference.
Please let me know if there are other steps I need to perform after replacing AAVMF_CODE.fd
Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work.
The AAVMF_CODE file is the UEFI boot firmware used during VM startup. After replacing this file, a VM reboot is necessary for the changes to take effect. Additionally, if you restart the pod, it will revert to the original virt-launcher image where the AAVMF_CODE file hasn't been replaced.
To make this change effective, you’ll need to replace the AAVMF_CODE.fd file in the virt-launcher image itself rather than in individual pods, then use this updated virt-launcher image to start the VM.
@andreabolognani Do you have any suggestion?
There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook. I'm not too familiar with these facilities, so I might be wrong about it. Rebuilding the virt-launcher image is obviously always going to be possible, but the process would be quite involved so I'd really leave it as a last ditch effort.
My suggestion would be to try and figure out a way to change
<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
in the domain XML to
<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.verbose.fd</loader>
The verbose build of AAVMF would hopefully produce at least some output pointing us in the right direction. Again, I'm not sure what facilities, if any, KubeVirt provides to inject this kind of change. Sidecar hook might be the one.
There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook.
Yes, sidecar is a good suggestion! It can run custom script before VM initialization. Here is an guidance, https://kubevirt.io/user-guide/user_workloads/hook-sidecar/
Based on this example, something like
apiVersion: v1
kind: ConfigMap
metadata:
name: my-config-map
data:
my_script.sh: |
#!/bin/sh
tempFile=`mktemp --dry-run`
echo $4 > $tempFile
sed -i "s|AAVMF_CODE.fd|AAVMF_CODE.verbose.fd|" $tempFile
cat $tempFile
(completely untested) should do the trick.
Hello @zhlhahaha @andreabolognani, following the suggestions above, I was to create a VM with the AAVMF_CODE.verbose.fd UEFI boot firmware. I followed the example here https://github.com/kubevirt/kubevirt/blob/main/examples/vmi-with-sidecar-hook-configmap.yaml but I did change the Fedora image to the one previously mentioned here https://github.com/kubevirt/kubevirt.github.io/issues/956#issuecomment-2259542037
I have attached all container logs of the virt-launcher and the dumpxml of my VM.
I could not use the sidecar hook functionality to get the host's local UEFI boot firmware to the VM (if possible) as a test. I am still troubleshooting (of course I will welcome any suggestions if we want to go down that route) but I wanted to provide you both with the current data in the meantime.
@jaredcash the XML configuration looks good, it's clearly pointing at the verbose AAVMF build now.
I don't see any guest output in the log, though admittedly I'm not entirely sure it's supposed to be there in the first place. Do you still get absolutely zero output on the VM's serial console?
@andreabolognani yes, unfortunately, I am still getting zero output from the VM's serial console. For reference:
$ virtctl console vmi-with-sidecar-hook-configmap
Successfully connected to vmi-with-sidecar-hook-configmap console. The escape sequence is ^]
$
I assume you're making sure to connect to the console the moment it is available, so no output is lost because of a delay.
Well, I'm truly out of ideas at this point. The VM configuration looks good, and even if the guest image was completely busted you should still get some output out of the verbose AAVMF build.
Since the pod at least remains up, maybe you can play inside it to try and get a better understanding. Maybe run virt-host-validate there, then try something like
$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
That should produce a lot of output.
@andreabolognani from within the pod, the virt-host-validate is passing, for reference:
$ kubectl exec -it virt-launcher-vmi-with-sidecar-hook-configmap-8hnl2 -- /bin/bash
bash-5.1$
bash-5.1$ virt-host-validate qemu
QEMU: Checking if device /dev/kvm exists : PASS
QEMU: Checking if device /dev/kvm is accessible : PASS
QEMU: Checking if device /dev/vhost-net exists : PASS
QEMU: Checking if device /dev/net/tun exists : PASS
QEMU: Checking for cgroup 'cpu' controller support : PASS
QEMU: Checking for cgroup 'cpuacct' controller support : PASS
QEMU: Checking for cgroup 'cpuset' controller support : PASS
QEMU: Checking for cgroup 'memory' controller support : PASS
QEMU: Checking for cgroup 'devices' controller support : PASS
QEMU: Checking for cgroup 'blkio' controller support : PASS
QEMU: Checking for device assignment IOMMU support : WARN (No ACPI IORT table found, IOMMU not supported by this hardware platform)
QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support)
bash-5.1$
Interestingly, I am getting no output from the qemu-kvm command. I left the command for an hour and still no output until I killed the command, for reference:
bash-5.1$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1$
I was playing around with the command but I am not seeing an option for a more verbose output.