kubevirt.github.io VM is not initializing on ARM64

What happened: I am unable to access a newly deployed VM. The output of kubectl get vmi shows that the VM is running and ready but I believe it is not fully initializing as I am unable to access the VM via virtctl console / virtctl ssh and there are no guest console logs from the virt-launcher pod. As a note, I deployed the Kubernetes cluster using K0s. All nodes in the cluster are passing qemu validation:

node3:~$ virt-host-validate qemu
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (Unknown if this platform has IOMMU support)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)

Kubevirt components:

$ kubectl get all -n kubevirt
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME                                   READY   STATUS    RESTARTS   AGE
pod/virt-api-64d75d4f5-66vxg           1/1     Running   0          22h
pod/virt-api-64d75d4f5-rl6cn           1/1     Running   0          22h
pod/virt-controller-64d65c6684-ggwlc   1/1     Running   0          22h
pod/virt-controller-64d65c6684-xqx7m   1/1     Running   0          22h
pod/virt-handler-82vdv                 1/1     Running   0          22h
pod/virt-handler-fsvz8                 1/1     Running   0          22h
pod/virt-handler-l664w                 1/1     Running   0          22h
pod/virt-operator-6c89df8955-jrjf9     1/1     Running   0          22h
pod/virt-operator-6c89df8955-r9wkj     1/1     Running   0          22h

NAME                                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/kubevirt-operator-webhook     ClusterIP   10.101.225.75   <none>        443/TCP   22h
service/kubevirt-prometheus-metrics   ClusterIP   None            <none>        443/TCP   22h
service/virt-api                      ClusterIP   10.96.236.192   <none>        443/TCP   22h
service/virt-exportproxy              ClusterIP   10.110.33.182   <none>        443/TCP   22h

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/virt-handler   3         3         3       3            3           kubernetes.io/os=linux   22h

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/virt-api          2/2     2            2           22h
deployment.apps/virt-controller   2/2     2            2           22h
deployment.apps/virt-operator     2/2     2            2           22h

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/virt-api-64d75d4f5           2         2         2       22h
replicaset.apps/virt-controller-64d65c6684   2         2         2       22h
replicaset.apps/virt-operator-6c89df8955     2         2         2       22h

NAME                            AGE   PHASE
kubevirt.kubevirt.io/kubevirt   22h   Deployed
############################
$ kubectl get pod,vm,vmi
NAME                             READY   STATUS    RESTARTS   AGE
pod/virt-launcher-testvm-fhrc2   3/3     Running   0          11m

NAME                                AGE   STATUS    READY
virtualmachine.kubevirt.io/testvm   11m   Running   True

NAME                                        AGE   PHASE     IP             NODENAME   READY
virtualmachineinstance.kubevirt.io/testvm   11m   Running   10.244.135.7   node3      True

What you expected to happen: Deploy a working VM using kubevirt.

How to reproduce it (as minimally and precisely as possible):

Deploy a K0s kubernetes cluster using k0sctl (https://docs.k0sproject.io/v1.30.0+k0s.0/k0sctl-install/) on a Turing RK1 compute module. Note, I am using Calico with vxlan as my CNI but this did fail with the same issue using kube-router (default CNI with K0s)
Install Kubervirt
Deploy a test VM following https://kubevirt.io/labs/kubernetes/lab1

Additional context: My server is using ARM64 architecture and the hardware is Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image. I have attempted to use an aarch64 image for my kubevirt VM but that also failed to initialize (I used image quay.io/kubevirt/cirros-container-disk-demo:v1.2.2-arm64).

I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).

I have attached the logs from the virt-launcher pod (all containers) and my kubevirt CR object.

Environment:

KubeVirt version (use virtctl version): v1.2.1
Kubernetes version (use kubectl version): v1.30.0+k0s
VM or VMI specifications:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: testvm
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/size: small
        kubevirt.io/domain: testvm
    spec:
      domain:
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
          - name: default
            masquerade: {}
        resources:
          requests:
            memory: 64M
      networks:
      - name: default
        pod: {}
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/kubevirt/cirros-container-disk-demo
        - name: cloudinitdisk
          cloudInitNoCloud:
            userDataBase64: SGkuXG4=

Cloud provider or hardware configuration: Hardware is baremetal Turing RK1 compute modules (https://turingpi.com/product/turing-rk1/). The cluster is 4 nodes (1 controller and 3 workers) but I had this issue using one RK1 node as a single node cluster.
OS (e.g. from /etc/os-release): Ubuntu 22.04.3
Kernel (e.g. uname -a): 5.10.160-rockchip aarch64 GNU/Linux
Install tools: K0s with Calico vxlan was used to deploy the Kubernetes cluster

kubevirt-cr-yaml.txt virt-launcher-logs.txt

Jul 03 '24 02:07 jaredcash

/cc @zhlhahaha Is this something you are able to help with?

Jul 24 '24 14:07 aburdenthehand

    resources:
      requests:
        memory: 64M

Hi @jaredcash Can you try to increase the memory from 64M to 256M?

Jul 25 '24 01:07 zhlhahaha

@zhlhahaha unfortunately, increasing the memory did not help. It still appears that the VM will not initialize as the guest-console-log container logs are blank and I am unable to access it:

        resources:
          requests:
            memory: 256M

guest-console-log:

$ kubectl logs virt-launcher-testvm2-6rkq6 -c guest-console-log | wc -l
0

Note: I also tried creating the VM with 1G of memory and got the same results.

I attached the describe of both the virt-launcher pod and the new VM object with 256M in case that is helpful. Please let me know if any additional items are needed.

vm-obj-describe.txt pod_virt-launcher-obj-describe.txt

Jul 25 '24 20:07 jaredcash

It is interesting that there are no failure logs in virt-launcher.log or the console log. This usually indicates that the VM may be encountering a boot failure during the bootloader stage. This could be caused by incorrect UEFI firmware, a corrupted VM disk, or a mismatch in the CPU architecture of the VM disk. I will investigate this further in my local environment.

Jul 26 '24 02:07 zhlhahaha

The quay.io/kubevirt/cirros-container-disk-demo:latest is only for x86_64. Can you use this image and make sure the allocated memory is equal or larger than 256M?

quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64

I can successfully boot the VM based on the image in my local env.

Jul 29 '24 09:07 zhlhahaha

@aburdenthehand If @jaredcash verified the image issue, we can update the document https://kubevirt.io/labs/kubernetes/lab1.

Jul 29 '24 09:07 zhlhahaha

I don't see any of the labs specifying the cirros disk image. Am I missing something? That said, I would be happy to have any callouts or alternative steps to support additional architectures in the labs, and really any and all improvement to the labs. Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.

Jul 29 '24 11:07 aburdenthehand

The quay.io/kubevirt/cirros-container-disk-demo:latest is only for x86_64. Can you use this image and make sure the allocated memory is equal or larger than 256M?
quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64
I can successfully boot the VM based on the image in my local env.

@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:

$ kubectl get vm
NAME      AGE   STATUS    READY
testvm1   43m   Running   True

$ kubectl get vm testvm1 -o custom-columns=MEMORY:.spec.template.spec.domain.resources.requests.memory,IMAGE:.spec.template.spec.volumes[0].containerDisk.image
MEMORY   IMAGE
1G       quay.io/kubevirt/cirros-container-disk-demo:20240729_74e137bba-arm64

$ kubectl logs virt-launcher-testvm1-vwwr8 -c guest-console-log | wc -l
0

I'm unsure if this is somehow related to my hardware, even though the virt-host-validate is seemingly passing for each node and I am able to deploy a VM using virt-install on the worker nodes.

Jul 29 '24 21:07 jaredcash

I don't see any of the labs specifying the cirros disk image. Am I missing something?

It do not specific cirros disk image in the doc. However, the command in the vm configuration file vm.yaml contains the disk image information, and in which it would use x86 only cirros image.

wget https://kubevirt.io/labs/manifests/vm.yaml

Please feel free raise an issue specifying the improvement to make or a PR to add the required info. If the former, we can add a 'good-first-issue' label.

Ok, I will rise an issue after I solve @jaredcash 's problem.

Jul 30 '24 02:07 zhlhahaha

@zhlhahaha unfortunately, the VM is still not initializing with the ARM specific image and higher allocated memory. The VM still shows as running but again I'm unable to access it and there are no console logs:

Would you mind to collect the following information?

show if the qemu process is running ps aux|grep qemu
Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log

$ kubectl edit kubevirt -n kubevirt
apiVersion: kubevirt.io/v1
kind: KubeVirt
...
spec:
...
  configuration:
    developerConfiguration:
      logVerbosity:
        virtLauncher: 8
...
status:

Are you using virtctl console testvm to visit the virtual machine? Can you access the vm console or you get an error message when run this command?

Jul 30 '24 02:07 zhlhahaha

hello @zhlhahaha, here is the requested information:

show if the qemu process is running ps aux|grep qemu

Please view the following of the qemu processes on the worker node:

root@node3:~# ps aux | grep qemu
uuidd    1045139  0.0  0.1 1686836 11756 ?       Ssl  20:40   0:00 /usr/bin/virt-launcher-monitor --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd    1045157  0.1  0.6 2561068 52644 ?       Sl   20:40   0:01 /usr/bin/virt-launcher --qemu-timeout 269s --name testvm1 --uid 72b0120f-b4ad-4a1b-a612-3b37466eeebc --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/AAVMF --run-as-nonroot
uuidd    1045172  0.0  0.2 1290448 21328 ?       Sl   20:40   0:00 /usr/sbin/virtqemud -f /var/run/libvirt/virtqemud.conf
uuidd    1045391  100  2.1 1718748 170312 ?      Sl   20:40  23:57 /usr/libexec/qemu-kvm -name guest=default_testvm1,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/run/kubevirt-private/libvirt/qemu/lib/domain-1-default_testvm1/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/AAVMF/AAVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-private/libvirt/qemu/nvram/testvm1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine virt-rhel9.2.0,usb=off,gic-version=3,dump-guest-core=off,memory-backend=mach-virt.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,acpi=on -accel kvm -cpu host -m size=976896k -object {"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1000341504} -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object {"qom-type":"iothread","id":"iothread1"} -uuid cfb867c9-fa3a-51f5-b0f5-485fd556fd68 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=20,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"} -device {"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"} -device {"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"} -device {"driver":"pcie-root-port","port":11,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x1.0x3"} -device {"driver":"pcie-root-port","port":12,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x1.0x4"} -device {"driver":"pcie-root-port","port":13,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x1.0x5"} -device {"driver":"pcie-root-port","port":14,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x1.0x6"} -device {"driver":"pcie-root-port","port":15,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x1.0x7"} -device {"driver":"pcie-root-port","port":16,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x2"} -device {"driver":"pcie-root-port","port":17,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x2.0x1"} -device {"driver":"pcie-root-port","port":18,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x2.0x2"} -device {"driver":"pcie-root-port","port":19,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x2.0x3"} -device {"driver":"qemu-xhci","id":"usb","bus":"pci.5","addr":"0x0"} -device {"driver":"virtio-scsi-pci-non-transitional","id":"scsi0","bus":"pci.6","addr":"0x0"} -device {"driver":"virtio-serial-pci-non-transitional","id":"virtio-serial0","bus":"pci.7","addr":"0x0"} -blockdev {"driver":"file","filename":"/var/run/kubevirt/container-disks/disk_0.img","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-3-format","read-only":true,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.8","addr":"0x0","drive":"libvirt-2-format","id":"ua-containerdisk","bootindex":1,"write-cache":"on","werror":"stop","rerror":"stop"} -blockdev {"driver":"file","filename":"/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/testvm1/noCloud.iso","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device {"driver":"virtio-blk-pci-non-transitional","bus":"pci.9","addr":"0x0","drive":"libvirt-1-format","id":"ua-cloudinitdisk","write-cache":"on","werror":"stop","rerror":"stop"} -netdev {"type":"tap","fd":"21","vhost":true,"vhostfd":"23","id":"hostua-default"} -device {"driver":"virtio-net-pci-non-transitional","host_mtu":1450,"netdev":"hostua-default","id":"ua-default","mac":"6e:7e:49:88:36:2d","bus":"pci.1","addr":"0x0","romfile":""} -add-fd set=0,fd=19,opaque=serial0-log -chardev socket,id=charserial0,fd=17,server=on,wait=off,logfile=/dev/fdset/0,logappend=on -serial chardev:charserial0 -chardev socket,id=charchannel0,fd=18,server=on,wait=off -device {"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"} -device {"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"} -device {"driver":"usb-kbd","id":"input1","bus":"usb.0","port":"2"} -audiodev {"id":"audio1","driver":"none"} -vnc vnc=unix:/var/run/kubevirt-private/72b0120f-b4ad-4a1b-a612-3b37466eeebc/virt-vnc,audiodev=audio1 -device {"driver":"virtio-gpu-pci","id":"video0","max_outputs":1,"bus":"pci.2","addr":"0x0"} -device {"driver":"virtio-balloon-pci-non-transitional","id":"balloon0","free-page-reporting":true,"bus":"pci.10","addr":"0x0"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
root     1054126  0.0  0.0   6020  1984 pts/1    S+   21:04   0:00 grep --color=auto qemu

Edit the kubevirt config to get more information, like the following. Then start the vmi and get the virt-launcher.log

I have attached all the container logs of the virt-launcher pod after adding more verbose logging.

Are you using virtctl console testvm to visit the virtual machine? Can you access the vm console or you get an error message when run this command?

I do have virtctl installed on the manager and I have attempted to console into the VM. I do not get an error message but I just get a blank display. Even if I hit any key, it will not progress any further until I ctrl + ] to escape, for reference:

$ virtctl console testvm1
Successfully connected to testvm1 console. The escape sequence is ^]
                                                                    
$

Please let me know if anything additional is needed.

virt-launcher-all-containers.log

Jul 30 '24 21:07 jaredcash

There is no error from the virt-launcher.log and qemu process can start successfully. Let's try fedora image, would you mind to use following config to start fedora image?

---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-fedora
  name: vmi-fedora
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      interfaces:
      - masquerade: {}
        name: default
      rng: {}
    resources:
      requests:
        memory: 1024M
  networks:
  - name: default
    pod: {}
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: quay.io/containerdisks/fedora:40
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk

Jul 31 '24 02:07 zhlhahaha

With the fedora image, the VM is still not initializing. Even though vmi is showing the VM as running, virtctl console is still blank. I am personally not noticing outliers in the logs but I have attached them for further inspection. Outputs for reference:

$ kubectl get pod,vmi
NAME                                 READY   STATUS    RESTARTS   AGE
pod/virt-launcher-testvm1-xxf76      3/3     Running   0          24h
pod/virt-launcher-vmi-fedora-rjtdr   3/3     Running   0          20m

NAME                                            AGE   PHASE     IP              NODENAME   READY
virtualmachineinstance.kubevirt.io/testvm1      24h   Running   10.244.135.15   node3      True
virtualmachineinstance.kubevirt.io/vmi-fedora   20m   Running   10.244.104.13   node2      True

As a note, I also tested the fedora VM with 2048M and got the same results. The logs I provided are from the 1024M VM.

fedora-virt-launcher.log

Jul 31 '24 21:07 jaredcash

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Oct 29 '24 21:10 kubevirt-bot

Unfortunately, this issue persists. @zhlhahaha and/or Kubevirt team, have you had some time to review my previous message? Please let me know if anything is needed to continue troubleshooting this issue.

Oct 30 '24 01:10 jaredcash

I have been interested in using Kubevirt but I have been running into this same issue when using different Kubernetes deployments (KIND and Minikube). All tests have been done on a Turing Pi RK1 cluster (single node and multi-node).

Sorry, I missed your message. I suspect the UEFI boot failed. Would you mind to provide more information?

CPU information and Memory infromation, sudo dmidecode -t processor && free -h, as you have three node, check if all nodes have the same CPU.
the configuration of cirros vm started via virsh dumpxml vmname

Oct 30 '24 02:10 zhlhahaha

@zhlhahaha it seems there is an issue with dmidecode on aarch64 systems as I am getting the following error on my baremetal servers:

$ sudo dmidecode -t processor
# dmidecode 3.3
# No SMBIOS nor DMI entry point found, sorry.

I gather the CPU information for all my nodes via the lscpu command. Please let me know if the information is fine or if another command is suggested (e.g. lshw).

Additionally, I deployed a new Cirros VM using the image suggested here (https://github.com/kubevirt/kubevirt.github.io/issues/956#issuecomment-2255482142) and gather the dumpxml of this VM. Note, that the VM is experiencing our issue, failing to initialize. nodes-cpu-mem-info.txt cirros-dumpxml.txt

Oct 30 '24 03:10 jaredcash

Hi @jaredcash The cpu info is ok, I saw it use A72 and A55 Arm cpu, I need to check their spec. I used to start kubevirt on Raspberry Pi 4 which has Cortex-A72 CPU. In terms of the cirros vm configuration, I means the successfully boot cirros vm configuration via pure virsh, as you said:

 I have been able to successfully deploy a Cirros VM using virsh with cirros-0.5.2-aarch64 image.

Oct 30 '24 06:10 zhlhahaha

Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh. virsh-cirros-dumpxml.txt

Oct 31 '24 00:10 jaredcash

Apologies for my misunderstanding @zhlhahaha. I have attached the dumpxml of the cirros VM I created with pure virsh. virsh-cirros-dumpxml.txt

Thanks! I didn’t notice any differences between the successfully booted Cirros VM and the KubeVirt one. Would you mind double-checking if the successfully booted Cirros VM is starting on the server with the Cortex-A55 CPU? Initially, I suspected a difference in the Generic Interrupt Controller (GIC) versions between the Cortex-A55 and Cortex-A72 CPUs, but they appear to use the same GIC version. Now, it seems the UEFI firmware may be the only possible cause. Would you be able to replace /usr/share/AAVMF/AAVMF_CODE.fd in the virt-launcher with the one from the host?

Nov 01 '24 03:11 zhlhahaha

@zhlhahaha I redeployed the cirros test VM I did with kubevirt to the same node (node4) to ensure it is using the same CPU (Cortex-A55). Regarding replacing AAVMF_CODE.fd , it seems that the virt-launcher pod does not allow edits of this file as sudo is not available:

$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- ls -l /usr/share/AAVMF/AAVMF_CODE.fd
lrwxrwxrwx 1 root root 42 Jan  1  1970 /usr/share/AAVMF/AAVMF_CODE.fd -> ../edk2/aarch64/QEMU_EFI-silent-pflash.raw
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- mv /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/AAVMF/AAVMF_CODE.fd.bak
mv: cannot move '/usr/share/AAVMF/AAVMF_CODE.fd' to '/usr/share/AAVMF/AAVMF_CODE.fd.bak': Permission denied
command terminated with exit code 1
$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- sudo rm -f /usr/share/AAVMF/AAVMF_CODE.fd
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8189abc40bee955c058cc308065c5bbff4ba28ec0438f0c08fe369f6cc0aebb3": OCI runtime exec failed: exec failed: unable to start container process: exec: "sudo": executable file not found in $PATH: unknown

Note, I did attempt to become the root user but it is asking for a password:

$ kubectl exec -it virt-launcher-testvm1-hc2j8 -- /bin/bash
bash-5.1$ su - root
Password:
su: Authentication failure
bash-5.1$

As a workaround, I used nsenter from the host node to copy the file from the host node to the virt-launcher pod which worked:

[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 qemu qemu 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.ra

After giving it some time, the VM was still not initializing. In an attempt to get it to work, I changed the ownership of AAVMF_CODE.fd to root:root

[root@testvm1 /]# ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw

Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work. I have re-copied the AAVMF_CODE.fd from the host node to the pod after the pod restart, so the current state is the following:

$ kubectl exec -it virt-launcher-testvm1-4hvwn -- ls -l /usr/share/AAVMF/
total 65536
-rw-r--r-- 1 root root 67108864 Nov  2 03:33 AAVMF_CODE.fd
lrwxrwxrwx 1 root root       35 Jan  1  1970 AAVMF_CODE.verbose.fd -> ../edk2/aarch64/QEMU_EFI-pflash.raw
lrwxrwxrwx 1 root root       40 Jan  1  1970 AAVMF_VARS.fd -> ../edk2/aarch64/vars-template-pflash.raw

I have also attached fresh logs of the virt-launcher pod for reference. Please let me know if there are other steps I need to perform after replacing AAVMF_CODE.fd

virt-launcher-testvm1.log

Nov 02 '24 03:11 jaredcash

Unfortunately, the VM still did not initialize. I restarted the pod to see if that would work but it unfortunately did not work.

The AAVMF_CODE file is the UEFI boot firmware used during VM startup. After replacing this file, a VM reboot is necessary for the changes to take effect. Additionally, if you restart the pod, it will revert to the original virt-launcher image where the AAVMF_CODE file hasn't been replaced.

To make this change effective, you’ll need to replace the AAVMF_CODE.fd file in the virt-launcher image itself rather than in individual pods, then use this updated virt-launcher image to start the VM.

@andreabolognani Do you have any suggestion?

Nov 04 '24 02:11 zhlhahaha

There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook. I'm not too familiar with these facilities, so I might be wrong about it. Rebuilding the virt-launcher image is obviously always going to be possible, but the process would be quite involved so I'd really leave it as a last ditch effort.

My suggestion would be to try and figure out a way to change

<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>

in the domain XML to

<loader readonly='yes' secure='no' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.verbose.fd</loader>

The verbose build of AAVMF would hopefully produce at least some output pointing us in the right direction. Again, I'm not sure what facilities, if any, KubeVirt provides to inject this kind of change. Sidecar hook might be the one.

Nov 04 '24 09:11 andreabolognani

There might be a way to inject files into the pod before the VM starts, for example using the sidecar hook.

Yes, sidecar is a good suggestion! It can run custom script before VM initialization. Here is an guidance, https://kubevirt.io/user-guide/user_workloads/hook-sidecar/

Nov 04 '24 09:11 zhlhahaha

Based on this example, something like

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-config-map
data:
  my_script.sh: |
    #!/bin/sh
    tempFile=`mktemp --dry-run`
    echo $4 > $tempFile
    sed -i "s|AAVMF_CODE.fd|AAVMF_CODE.verbose.fd|" $tempFile
    cat $tempFile

(completely untested) should do the trick.

Nov 04 '24 10:11 andreabolognani

Hello @zhlhahaha @andreabolognani, following the suggestions above, I was to create a VM with the AAVMF_CODE.verbose.fd UEFI boot firmware. I followed the example here https://github.com/kubevirt/kubevirt/blob/main/examples/vmi-with-sidecar-hook-configmap.yaml but I did change the Fedora image to the one previously mentioned here https://github.com/kubevirt/kubevirt.github.io/issues/956#issuecomment-2259542037

I have attached all container logs of the virt-launcher and the dumpxml of my VM.

I could not use the sidecar hook functionality to get the host's local UEFI boot firmware to the VM (if possible) as a test. I am still troubleshooting (of course I will welcome any suggestions if we want to go down that route) but I wanted to provide you both with the current data in the meantime.

fedora-sidecar-vm.log fedora-dumpxml.txt

Nov 07 '24 00:11 jaredcash

@jaredcash the XML configuration looks good, it's clearly pointing at the verbose AAVMF build now.

I don't see any guest output in the log, though admittedly I'm not entirely sure it's supposed to be there in the first place. Do you still get absolutely zero output on the VM's serial console?

Nov 07 '24 11:11 andreabolognani

@andreabolognani yes, unfortunately, I am still getting zero output from the VM's serial console. For reference:

$ virtctl console vmi-with-sidecar-hook-configmap
Successfully connected to vmi-with-sidecar-hook-configmap console. The escape sequence is ^]

$

Nov 07 '24 23:11 jaredcash

I assume you're making sure to connect to the console the moment it is available, so no output is lost because of a delay.

Well, I'm truly out of ideas at this point. The VM configuration looks good, and even if the guest image was completely busted you should still get some output out of the verbose AAVMF build.

Since the pod at least remains up, maybe you can play inside it to try and get a better understanding. Maybe run virt-host-validate there, then try something like

$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio

That should produce a lot of output.

Nov 08 '24 10:11 andreabolognani

@andreabolognani from within the pod, the virt-host-validate is passing, for reference:

$ kubectl exec -it virt-launcher-vmi-with-sidecar-hook-configmap-8hnl2 -- /bin/bash
bash-5.1$
bash-5.1$ virt-host-validate qemu
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI IORT table found, IOMMU not supported by this hardware platform)
  QEMU: Checking for secure guest support                                    : WARN (Unknown if this platform has Secure Guest support)
bash-5.1$

Interestingly, I am getting no output from the qemu-kvm command. I left the command for an hour and still no output until I killed the command, for reference:

bash-5.1$ /usr/libexec/qemu-kvm -accel kvm -M virt,gic-version=3 -cpu host -m 1024 -drive if=pflash,format=raw,file=/usr/share/AAVMF/AAVMF_CODE.verbose.fd,readonly=on -display none -serial stdio
qemu-kvm: terminating on signal 2
bash-5.1$

I was playing around with the command but I am not seeing an option for a more verbose output.

Nov 09 '24 01:11 jaredcash