gokvm
gokvm copied to clipboard
Panic: unexpected kvm exit reason 8
Hi, I was trying out your project but haven't been able to get it to boot Linux. I ran make bzImage and make initrd (after installing libmnl static libraries from source), and then ./gokvm. This appears to start booting, but then gokvm crashes:
early console in extract_kernel
input_data: 0x00000000025f82bf
input_len: 0x00000000001fb058
output: 0x0000000001000000
output_len: 0x0000000000fe2740
kernel_total_size: 0x0000000001818000
needed_size: 0x0000000001a00000
trampoline_32bit: 0x000000000009d000
...
[ 0.186781][ T0] ---------------------
[ 0.187026][ T0] local_lock inversion 2: ok |
[ 0.187532][ T0] local_lock inversion 3A: ok |
[ 0.188051][ T0] local_lock inversion 3B: ok |
[ 0.188569][ T0] hardirq_unsafe_softirq_safe: ok |
[ 0.189121][ T0] -------------------------------------------------------
[ 0.189522][ T0] Good, all 349 testcases passed! |
[ 0.189816][ T0] ---------------------------------
[ 0.190162][ T0] APIC: Switch to symmetric I/O mode setup
[ 0.190491][ T0] Not enabling interrupt remapping due to skipped IO-APIC setup
[ 0.190922][ T0] Switched APIC routing to physical flat.
[ 0.191258][ T0] enabled ExtINT on CPU#0
[ 0.191515][ T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 5606.27 BogoMIPS (lpj=11212544)
[ 0.192173][ T0] pid_max: default: 32768 minimum: 512
[ 0.192521][ T0] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.192997][ T0] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.193919][ T0] Disabled fast string operations
[ 0.195568][ T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.195922][ T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.196305][ T0] Speculative Store Bypass: Vulnerable
panic: unexpected kvm exit reason: 8
goroutine 10 [running]:
main.main.func1(0x0?)
/home/zyedidia/programming/gokvm/main.go:30 +0x6e
created by main.main
/home/zyedidia/programming/gokvm/main.go:28 +0x185
I think this exit reason corresponds to KVM_EXIT_SHUTDOWN, but I'm not sure why KVM would return a shutdown code at this point in execution. Thanks for any insight you might have!
@zyedidia Thanks for reporting issue. For some reason KVM seems to be exiting with KVM_EXIT_SHUTDOWN as you said. But, I was not able to reproduce it on my laptop (Intel i7-5500U, Ubuntu 20.04 LTS). I would like to reproduce it if possible, so could you provide some additional information?
- CPU
- Host linux distribution
- Host linux kernel
Here is the additional information:
- CPU: Intel i7-1165G7
- Host linux: Ubuntu 21.10
- Host kernel: Linux 5.13.0-28-generic
Thanks for your help! I will also try on a different machine when I get the chance.
Is it possible KVM shut down because of the message Speculative Store Bypass: Vulnerable? Is there an internal KVM log, or something that would help understand why it decided to send a shutdown signal?
Hmm, I'm not sure if the guest will shut down due to CPU vulnerability or its mitigation. As you may know, the code under arch/x86/kvm outputs debugging information with the pr_debug function, which you can see by using dynamic debug [1].
If you're concerned around CPU vulnerabilities, how about temporarily disabling all mitigations by adding mitigations=off to the command-line parameters?
For your information, my command-line parameters and CPU information are as follows.
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.0-41-generic root=UUID=9c174809-4f37-4add-b76f-8f2ea977649a ro
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping: 4
CPU MHz: 600.000
CPU max MHz: 3000.0000
CPU min MHz: 500.0000
BogoMIPS: 4788.73
Virtualization: VT-x
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 512 KiB
L3 cache: 4 MiB
NUMA node0 CPU(s): 0-3
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopolo
gy nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap intel_pt xsaveopt dt
herm ida arat pln pts md_clear flush_l1d
[1] https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html
I encountered the similar issue on a physical computer and have a chance to trace through the code. The alternative_instructions() is causing the issue. For my case, it was because my cpu has X86_FEATURE_FSRM (Fast Short Rep Mov) feature.
I compared the physical computer's boot process output with another VM's output. Here is the VM's output:
Note: gokvm is running properly on the VM. Note: The VM is running on top of Hyper-V.
[ 1.068089][ T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 1.068985][ T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 1.069475][ T0] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 1.070155][ T0] Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation available!
[ 1.070156][ T0] Speculative Store Bypass: Vulnerable
[ 1.071315][ T0] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 1.335020][ T0] Freeing SMP alternatives memory: 12K
[ 1.552316][ T1] smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+ (family: 0xf, model: 0x6b, stepping: 0x1)
[ 1.552982][ T1] Performance Events: unsupported Netburst CPU model 107 no PMU driver, software events only.
[ 1.552982][ T1] rcu: Hierarchical SRCU implementation.
[ 1.552982][ T1] NMI watchdog: Perf NMI watchdog permanently disabled
[ 1.552982][ T1] smp: Bringing up secondary CPUs ...
After tracing through the code, gokvm exited on my physical computer at Freeing SMP alternatives memory: 12K line.
Currently, I am still digging into more information. I just wanted to share some information for now and will keep you posted once I have more finding.
Here is how I temporarily bypass it for now:
In apply_alternatives() function of _linux/arch/x86/kernel/alternative.c file, I added the following lines inside the for loop (for (a = start; a < end; a++)):
if (feature == X86_FEATURE_FSRM) {
goto next;
}
Then, rebuild bzImage and run gokvm:
rm -f bzImage && make bzImage
sudo ./gokvm boot -k ./bzImage -i ./initrd
Note: You need to comment out those curl, tar, cp lines in scripts/get_kernel.bash so that your modified kernel code will not be removed.
Another tip is to add debug-alternative in kernel command line parameter of the flag/flag.go file so that you'd be able to see a bit more information. For my cpu feature case (X86_FEATURE_FSRM), the one in question would have the output log like this (where I temporarily bypass it and then it went through and shows the u-root prompt):
[ 250.731535][ T0] SMP alternatives: feat: 18*32+4, old: (__memmove+0x17/0x1a0 (ffffffff81454f47) len: 10), repl: (ffffffff81e3aeb1, len: 0)
[ 250.731535][ T0] SMP alternatives: ffffffff81454f47: old_insn: 48 83 fa 20 0f 82 f5 00 00 00
[ 250.731535][ T0] SMP alternatives: ffffffff81454f47: final_insn: 90 90 90 90 90 90 90 90 90 90
My physical computer's cpu info:
$ lscpu | grep --color -E 'fsrm|Model name'
Model name: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap avx512ifma clflushopt intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm md_clear flush_l1d arch_capabilities
Another information is this issue is also reproducible when I run qemu with -cpu host on my phsical computer, so, I think it is more Linux kernel's issue instead of gokvm's issue:
$ qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp 2 \
-m 1024 \
-kernel ./bzImage \
-initrd ./initrd \
-nographic \
-append 'console=ttyS0' \
-serial mon:stdio
Hi @junftnt ,
Thanks for the detailed investigation! Sorry for the delay in replying.
I reproduced it in my machine (12th Gen Intel(R) Core(TM) i3-1220P).
And changing apply_alternatives() as you said solved it! Amazing.
Maybe removing X86_FEATURE_FSRM from the CPUID might solve the issue.
I think it might be related to KVM_SET_CPUID2 KVM API.
https://github.com/bobuhiro11/gokvm/blob/409a9c6ff7e97456c61c500557cf5111d3aa96b4/kvm/cpuid.go#L105
I will investigate this as well.
Hi @zyedidia @junftnt ,
It has been a while. I did some digging today and found that apparently PR #170 may be the fix. Could you please take a look?