cloud-hypervisor icon indicating copy to clipboard operation
cloud-hypervisor copied to clipboard

General Protection error on boot: !!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!

Open maburlik opened this issue 4 months ago • 4 comments

Describe the bug Booting VM with the configuration below frequently gets blocked at boot by General Protection error. Might be related to the fact the VM is a nested VM under KVM hypervisor. The parent VM is also a cloud-hypervisor VM.

Hoping to figure out if I'm missing validation at any layer. In a non-nested environment, VMs are consistently booting successfully.

Also looking for advice on what next steps I can take to dig deeper into the issue.

Error:

!!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP  - 000000006EF454CD, CS  - 0000000000000038, RFLAGS - 0000000000010246
RAX  - 0086000000000086, RCX - 000000006F6E9A18, RDX - 000000006F6E99E4
RBX  - 800000000000000E, RSP - 000000007FD11970, RBP - 000000007FD119F0
RSI  - 000000006EFBF320, RDI - 000000007FD38640
R8   - 000000006F6E98E8, R9  - 0000000000000001, R10 - 0000000000000002
R11  - 0000000000000000, R12 - 000000007FD3A360, R13 - 0000000000000008
R14  - 000000006EF8A598, R15 - 000000007FD3A2E0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000006FA01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000006F7DE000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000006F3E5018 0000000000000FFF,   TR - 0000000000000018
FXSAVE_STATE - 000000007FD115D0
!!!! Find image based on IP(0x6EF454CD) /home/maxb/myagent/_work/3561/s/edk2/edk2/Build/CloudHvX64/DEBUG_GCC5/X64/OvmfPkg/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/QemuFwCfgAcpiPlatform.dll (ImageBase=000000006EF44000, EntryPoint=000000006EF45696) !!!!

Configuration:

cloud-hypervisor: 35.059012ms: <vmm> INFO:vmm/src/vm.rs:502 -- Booting VM from config: Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 2, max_vcpus: 2, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures { amx: false } }, memory: MemoryConfig { size: 2147483648, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None, thp: true }, payload: Some(PayloadConfig { firmware: None, kernel: Some("/opt/microsoft/maxb/bin/rdct.code/OVMF-CLOUDHV-DEBUG.fd"), cmdline: None, initramfs: None }), rate_limit_groups: None, disks: Some([DiskConfig { path: Some("/mnt/maxbdataroot/data/_App/image_plugin_Inc0_actx_/work/target/7380e581-0670-40ae-b5eb-7ab6a30df09b_10/10"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, rate_limit_group: None, rate_limiter_config: None, id: None, disable_io_uring: false, disable_aio: false, pci_segment: 0, serial: None, queue_affinity: None }, DiskConfig { path: Some("/mnt/maxbdataroot/data/_App/local_storage_plugin_Inc0_actx_/work/local_storage/ls_plugin-33cfc922-004f-4d61-a9fe-c1f886a85924-mp/core_vm-1_work/cloud-init/nocloud_ds.iso"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, rate_limit_group: None, rate_limiter_config: None, id: None, disable_io_uring: false, disable_aio: false, pci_segment: 0, serial: None, queue_affinity: None }]), net: Some([NetConfig { tap: Some("maxb_tap_0"), ip: 192.168.249.1, mask: 255.255.255.0, mac: MacAddr { bytes: [82, 63, 126, 236, 115, 226] }, host_mac: None, mtu: None, iommu: false, num_queues: 2, queue_size: 256, vhost_user: false, vhost_socket: None, vhost_mode: Client, id: None, fds: None, rate_limiter_config: None, pci_segment: 0, offload_tso: true, offload_ufo: true, offload_csum: true }]), rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "/secrets", socket: "/tmp/rdct_sockets/virtiofsd-socket-2", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }, FsConfig { tag: "maxb_vm_diagnostics", socket: "/tmp/rdct_sockets/virtiofsd-socket-3", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Tty, iommu: false, socket: None }, console: ConsoleConfig { file: None, mode: Off, iommu: false, socket: None }, debug_console: DebugConsoleConfig { file: None, mode: Off, iobase: Some(233) }, devices: None, user_devices: None, vdpa: None, vsock: Some(VsockConfig { cid: 4, socket: "/tmp/rdct_sockets/maxb-vm-guest-socket-4.vsock", iommu: false, id: None, pci_segment: 0 }), pvpanic: false, iommu: false, sgx_epc: None, numa: None, watchdog: true, platform: None, tpm: None, preserved_fds: None }, poisoned: false, .. }
cloud-hypervisor: 36.228120ms: <vmm> INFO:arch/src/x86_64/mod.rs:586 -- Running under nested virtualisation. Hypervisor string: KVMKVMKVM   

Please see this file for full cloud-hypervisor stdout log (link below): bad_chv.stdout.1398203133572_0000_2024-10-08T22-22-09.824.log

After the failure above, another attempt to create and boot the VM is successful. Log (link below): good_chv.stdout.1845902438158_0000_2024-10-08T22-29-42.580.log

To Reproduce Create & Boot the VM. Please see logs below for configuration.

Version cloud-hypervisor v38.0

Did you build from source, if so build command line (e.g. features): rustup default stable && cargo build --release --all --manifest-path ${CMAKE_CURRENT_SOURCE_DIR}/cloud-hypervisor/Cargo.toml --target-dir ${CMAKE_CURRENT_BINARY_DIR}/cloud-hypervisor

VM configuration Used REST endpoint requests for create & boot. Please see log for configuration.

Guest OS version details: Ubuntu 22.04

Host OS version details: Ubuntu 22.04

Logs Attaching logs from the first failed run, then another set of logs from the run that came immediately afterwards that executed the same configuration, on the same host VM.

Failed run: bad_chv.stdout.1398203133572_0000_2024-10-08T22-22-09.824.log bad_virtiofsd.stdout.1398203133572.virtiofsd-socket-2_0000_2024-10-08T22-21-48.757.log bad_virtiofsd.stdout.1398203133572.virtiofsd-socket-3_0000_2024-10-08T22-21-48.703.log

Successful run: good_chv.stdout.1845902438158_0000_2024-10-08T22-29-42.580.log good_virtiofsd.stdout.1845902438158.virtiofsd-socket-4_0000_2024-10-08T22-29-16.476.log good_virtiofsd.stdout.1845902438158.virtiofsd-socket-5_0000_2024-10-08T22-29-16.420.log

maburlik avatar Oct 10 '24 01:10 maburlik