vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

KubeVirt doesn't work in vcluster, works on host cluster

Open DanWard opened this issue 4 years ago • 4 comments

Hey there,

I'm running into an issue where I'm trying to run KubeVirt on the vcluster side, however the pod dies shortly after being propagated down.

I have 3 master nodes with embedded etcd HA on k3s v1.22.1-rc1+k3s1 with another agent node as well. The nested cluster is running on v1.21.4-k3s1 as v1.22.1 CrashLoops. I'm running vcluster v0.4.0.

The syncer container is being run with the following args (and service accounts created/tested with relevant clusterrole):

      - args:
        - --service-name=demo-test-head
        - --suffix=demo-test
        - --owning-statefulset=demo-test
        - --out-kube-config-secret=demo-test
        - --fake-persistent-volumes=false
        - --enable-storage-classes
        - --fake-nodes=false
        - --sync-all-nodes

I've tested the commands in both the host and nested clusters: https://kubevirt.io/user-guide/operations/installation/

export RELEASE=v0.44.1
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml
kubectl -n kubevirt wait kv kubevirt --for condition=Available

The KubeVirt api, controller, handler, and operator pods all start and seem to function fine (i.e. they don't crash loop, or appear to be broken in any way).

Then create a VM from the kubevirt/demo repository:

kubectl apply -f https://raw.githubusercontent.com/kubevirt/demo/master/manifests/vm.yaml
kubectl describe vm testvm
kubectl patch virtualmachine testvm --type merge -p \
    '{"spec":{"running":true}}'

This starts the process of creating a pod, handler, and anything else needed. Inspecting the logs from the compute container in the virt-launcher-testvm-randid pod shows the following line: panic: timed out waiting for domain to be defined

Now I have no idea why it's throwing this, nor what it means - but this doesn't occur when following the same commands on the host cluster. I've also started testing KubeVirt v0.45.0-rc.0 but to no avail.

Once the pod has died enough times, it stops being scheduled. This then causes the CRD to hang forever on the deletion of the vmi (kubectl delete vmi/testvm). I suspect that fixing the initial failure will mean that the pods hang around and thus a deletion will succeed.

Any help would be greatly appreciated!

Edit: I found this RedHat issue which may be of some assistance, it talks about labelling but I'm not sure if it's applicable https://bugzilla.redhat.com/show_bug.cgi?id=1843456

DanWard avatar Sep 03 '21 03:09 DanWard

@DanWard thanks for creating this issue! Are there any logs in the kubevirt handler or in another component that indicate a problem?

FabianKramm avatar Sep 03 '21 07:09 FabianKramm

@FabianKramm @DanWard

Is there any update on this issue? I am facing facing exactly the same error when when installing kubevirt on vcluster-k8s

Kubevirt version: v0.55 vcluster version: 0.11.0

It seems like kubevirt cotrollers comes up fine on vcluster, post which if we try to create kind:VirtualMachineInstance or kind:VirtualMachine facing following error

virt-laucher-pod moves to error state root@setup-a-edge-1:~# kubectl get po NAME READY STATUS RESTARTS AGE virt-launcher-testvmi-nocloud-kwm4n 0/2 Error 0 14m

vmi is in falied state root@setup-a-edge-1:~# kubectl get vmi NAME AGE PHASE IP NODENAME READY testvmi-nocloud 13m Failed setup-a-edge-1 False

Logs from virt-laucher-pod

root@setup-a-edge-1:~# kubectl logs virt-launcher-testvmi-nocloud-kwm4n

`{"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager .go:76","timestamp":"2022-08-06T19:11:08.723903Z"} {"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:79","timestamp":"2022-08-06T19:11:08.723945Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu:///system","pos":"libvirt .go:496","timestamp":"2022-08-06T19:11:08.726186Z"} {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domai n=7, Message='Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory')","pos":" libvirt.go:504","timestamp":"2022-08-06T19:11:08.726639Z"} {"component":"virt-launcher","level":"info","msg":"libvirt version: 8.0.0, package: 2.module_el8.6.0+1087+b42c8 331 (CentOS Buildsys \[email protected]\u003e, 2022-02-08-22:20:52, )","subcomponent":"libvirt","thread":"42 ","timestamp":"2022-08-06T19:11:08.741000Z"} {"component":"virt-launcher","level":"info","msg":"hostname: testvmi-nocloud","subcomponent":"libvirt","thread" :"42","timestamp":"2022-08-06T19:11:08.741000Z"} {"component":"virt-launcher","level":"error","msg":"internal error: Child process (dmidecode -q -t 0,1,2,3,4,11 ,17) unexpected exit status 1: /dev/mem: No such file or directory","pos":"virCommandWait:2752","subcomponent": "libvirt","thread":"42","timestamp":"2022-08-06T19:11:08.741000Z"} {"component":"virt-launcher","level":"info","msg":"Connected to libvirt daemon","pos":"libvirt.go:512","timesta mp":"2022-08-06T19:11:09.227552Z"} {"component":"virt-launcher","level":"info","msg":"Registered libvirt event notify callback","pos":"client.go:5 09","timestamp":"2022-08-06T19:11:09.229688Z"} {"component":"virt-launcher","level":"info","msg":"Marked as ready","pos":"virt-launcher.go:73","timestamp":"20 22-08-06T19:11:09.229808Z"} panic: timed out waiting for domain to be defined

goroutine 1 [running]: main.waitForDomainUUID(0xc0003a68a0, 0xc0003a6720, 0xc000390420, {0x1ce7fe0, 0xc000142270}) cmd/virt-launcher/virt-launcher.go:243 +0x43a main.main() cmd/virt-launcher/virt-launcher.go:474 +0x107a {"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 12 with status 512","pos":"virt-launcher- monitor.go:125","timestamp":"2022-08-06T19:15:39.234424Z"} {"component":"virt-launcher-monitor","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"v irt-launcher-monitor.go:143","timestamp":"2022-08-06T19:15:39.234529Z"}`

Attaching the logs of virt-handler, used values.yaml on vcluster-k8s and vmi manifest.

test-vm.txt values.txt virt-handler-logs.txt

Similar issue is reported in kubervirt (on k0s/k3) https://github.com/kubevirt/kubevirt/issues/5069

Any suggestion or leads will really help us.

vysr2939 avatar Aug 06 '22 19:08 vysr2939

@DanWard @FabianKramm facing the same issue when trying to bring up kubevirt vmi pod on vcluster setup.

aravindgpd avatar Aug 11 '22 14:08 aravindgpd

virt-handler logs during the creation of vm in the vcluster

{"component":"virt-handler","level":"info","msg":"Generic Allocate: resourceName: tun","pos":"generic_device.go:244","timestamp":"2022-08-16T11:04:16.479450Z"} {"component":"virt-handler","level":"info","msg":"Generic Allocate: request: [\u0026ContainerAllocateRequest{DevicesIDs:[tun894],}]","pos":"generic_device.go:245","timestamp":"2022-08-16T11:04:16.479544Z"} {"component":"virt-handler","level":"info","msg":"resyncing virt-launcher domains","pos":"cache.go:384","timestamp":"2022-08-16T11:04:25.879729Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Scheduled | Domain does not exist","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1553","timestamp":"2022-08-16T11:04:52.118652Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"migration is block migration because of containerdisk volume","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:2215","timestamp":"2022-08-16T11:04:52.118877Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"migration is block migration because of cloudinitdisk volume","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:2215","timestamp":"2022-08-16T11:04:52.118927Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed | Domain does not exist","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1553","timestamp":"2022-08-16T11:04:52.196383Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"Performing final local cleanup for vmi with uid 65727d69-73c7-45ef-a8d8-7bde24883087","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1809","timestamp":"2022-08-16T11:04:52.196461Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"No container disk mount entries found to unmount","name":"testvmi-nocloud","namespace":"default","pos":"mount.go:357","timestamp":"2022-08-16T11:04:52.196529Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"Cleaning up remaining hotplug volumes","name":"testvmi-nocloud","namespace":"default","pos":"mount.go:708","timestamp":"2022-08-16T11:04:52.196556Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Removing domain from cache during final cleanup","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1843","timestamp":"2022-08-16T11:04:52.196644Z","uid":""} {"component":"virt-handler","level":"info","msg":"resyncing virt-launcher domains","pos":"cache.go:384","timestamp":"2022-08-16T11:09:25.878850Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Failed | Domain does not exist","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1553","timestamp":"2022-08-16T11:09:53.485536Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"Performing final local cleanup for vmi with uid 65727d69-73c7-45ef-a8d8-7bde24883087","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1809","timestamp":"2022-08-16T11:09:53.485668Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"No container disk mount entries found to unmount","name":"testvmi-nocloud","namespace":"default","pos":"mount.go:357","timestamp":"2022-08-16T11:09:53.485777Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"","level":"info","msg":"Cleaning up remaining hotplug volumes","name":"testvmi-nocloud","namespace":"default","pos":"mount.go:708","timestamp":"2022-08-16T11:09:53.485822Z","uid":"65727d69-73c7-45ef-a8d8-7bde24883087"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Removing domain from cache during final cleanup","name":"testvmi-nocloud","namespace":"default","pos":"vm.go:1843","timestamp":"2022-08-16T11:09:53.485943Z","uid":""}

aravindgpd avatar Aug 16 '22 11:08 aravindgpd

Hi @DanWard , @vysr2939 and @aravindgpd we recently released v0.13.0-alpha.0 of vcluster with the HostpathMapper feature. Would you like to give it a spin as see if it fixes your problem? You can enable the feature with

hostpathMapper
  enabled: true

ishankhare07 avatar Nov 02 '22 12:11 ishankhare07

https://github.com/loft-sh/vcluster-generic-crd-sync-plugin We can close this issue by using this plugin @ishankhare07

wizpresso-steve-cy-fan avatar Dec 08 '22 08:12 wizpresso-steve-cy-fan

I assume that enabling the hostpathMapper feature, mentioned by Ishan above, fixes the KubeVirt issues. I will thus close this issue.

matskiv avatar Jan 10 '23 10:01 matskiv

@wizpresso-steve-cy-fan I just encountered this same issue now. can someone shed some lights on how it can be resolved with https://github.com/loft-sh/vcluster-generic-crd-sync-plugin.

Compare the logs of virt-handler for the demo vm, I noticed that the working instance on host cluster has following lines in the logs while missing from the failed instance in vcluster:

{"component":"virt-launcher","kind":"","level":"info","msg":"Executing PreStartHook on VMI pod environment","name":"testvm","namespace":"vcvm","pos":"manager.go:531","timestamp":"2023-04-15T21:15:22.760831Z","uid":"e3a9fb70-273f-4498-8f16-30e1660a6fbe"} {"component":"virt-launcher","kind":"","level":"info","msg":"Starting PreCloudInitIso hook","name":"testvm","namespace":"vcvm","pos":"manager.go:552","timestamp":"2023-04-15T21:15:22.760942Z","uid":"e3a9fb70-273f-4498-8f16-30e1660a6fbe"}

GOVYANSONG avatar Apr 15 '23 18:04 GOVYANSONG

On working instance (on host): this path exists: /var/run/kubevirt-ephemeral-disks/disk-data/rootfs/disk.qcow2 On non-working vcluster: this path is empty: /var/run/kubevirt-ephemeral-disks/container-disk-data

GOVYANSONG avatar Apr 16 '23 20:04 GOVYANSONG

@GOVYANSONG IMHO the generic sync plugin has nothing to do with this :thinking: If you don't have the hostpath mapper feature enabled - please try it. Docs are here - https://www.vcluster.com/docs/operator/monitoring-logging#enabling-hostpath-mapper And if it still doesn't work for you, please create a new issue.

matskiv avatar Apr 25 '23 15:04 matskiv

@GOVYANSONG IMHO the generic sync plugin has nothing to do with this 🤔 If you don't have the hostpath mapper feature enabled - please try it. Docs are here - https://www.vcluster.com/docs/operator/monitoring-logging#enabling-hostpath-mapper And if it still doesn't work for you, please create a new issue.

@matskiv It seems the link provided earlier is no longer valid or accessible. Could you please provide an updated link? Thanks.

yeahdongcn avatar Feb 04 '24 01:02 yeahdongcn

Docs have a search function ;) https://www.vcluster.com/docs/o11y/logging/hpm

matskiv avatar Feb 05 '24 21:02 matskiv