问题描述/What happened:
上传 系统镜像 或 拉起虚拟机,宿主机会发生重启,内核 panic 日志如下
[ 760.046684] BUG: kernel NULL pointer dereference, address: 0000000000000178
[ 778.433195] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
[ 778.433196] Modules linked in: act_police cls_basic sch_ingress vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_multiport ipt_rpfilter iptable_raw ip_set_hash_ip ip_set_hash_net ipip tunnel4 ip_tunnel openvswitch nf_conncount vhost_net vhost vhost_iotlb tap tun xt_addrtype xt_set ip_set_hash_ipportnet ip_set_bitmap_port ip_set_hash_ipportip ip_set_hash_ipport dummy nf_tables ip_set ip6table_mangle iptable_mangle ip6table_filter ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment xt_mark xt_nat iptable_filter iptable_nat ip_tables veth nf_conntrack_netlink nfnetlink rfkill overlay ip_vs_ftp nf_nat ip_vs_sed ip_vs_nq ip_vs_fo ip_vs_sh ip_vs_dh ip_vs_lblcr ip_vs_lblc ip_vs_wrr ip_vs_rr ip_vs_wlc ip_vs_lc ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c sunrpc dm_snapshot dm_bufio intel_powerclamp mgag200 coretemp joydev kvm_intel i2c_algo_bit drm_kms_helper kvm syscopyarea sysfillrect sysimgblt irqbypass ses fb_sys_fops cec enclosure ipmi_ssif dcdbas
[ 778.433237] scsi_transport_sas pcspkr sg iTCO_wdt iTCO_vendor_support acpi_power_meter ipmi_si intel_cstate gpio_ich intel_uncore ipmi_devintf i7core_edac ipmi_msghandler lpc_ich acpi_cpufreq br_netfilter bridge stp llc drm fuse ext4 mbcache jbd2 sr_mod cdrom sd_mod t10_pi ata_generic crct10dif_pclmul crc32_pclmul ata_piix crc32c_intel libata megaraid_sas ghash_clmulni_intel serio_raw bnx2 wmi dm_mirror dm_region_hash dm_log dm_mod
[ 778.433257] CPU: 3 PID: 32884 Comm: etcd Kdump: loaded Tainted: G S I 5.10.0-182.0.0.95.oe2203sp3.x86_64 #1
[ 778.433258] Hardware name: Dell Inc. PowerEdge R610/08GXHX, BIOS 6.3.0 07/24/2012
[ 778.433259] RIP: 0010:native_queued_spin_lock_slowpath+0x179/0x1c0
[ 778.433261] Code: eb eb c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 6c 03 00 48 03 04 f5 20 5b a1 94 48 89 10 8b 42 08 85 c0 75 09 90 8b 42 08 85 c0 74 f7 48 8b 32 48 85 f6 74 97 0f 18 0e eb 92
[ 778.433261] RSP: 0018:ffffa51ae1d13a10 EFLAGS: 00000046
[ 778.433262] RAX: 0000000000000000 RBX: ffff979e0fab5d40 RCX: 0000000000100000
[ 778.433263] RDX: ffff979e0f8b6c00 RSI: 0000000000000015 RDI: ffff979e0fab5d40
[ 778.433264] RBP: ffff979e0fab5d40 R08: 0000000000000003 R09: 000000000000000b
[ 778.433265] R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000046
[ 778.433265] R13: ffffa51ae1d13c48 R14: 000000000000000b R15: 0000000000000003
[ 778.433266] FS: 000000c00008c090(0000) GS:ffff979e0f880000(0000) knlGS:0000000000000000
[ 778.433267] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 778.433268] CR2: 000000c00190b120 CR3: 0000000d92ab0004 CR4: 00000000000226e0
[ 778.433268] Call Trace:
[ 778.433269] <NMI>
[ 778.433269] ? watchdog_hardlockup_check.part.0.cold+0x21/0x73
[ 778.433270] ? __perf_event_overflow+0x52/0x100
[ 778.433270] ? handle_pmi_common+0x218/0x2d0
[ 778.433271] ? set_pte_vaddr_p4d+0x3f/0x50
[ 778.433272] ? flush_tlb_one_kernel+0xa/0x20
[ 778.433272] ? native_set_fixmap+0x4f/0x70
[ 778.433273] ? ghes_copy_tofrom_phys+0x74/0x120
[ 778.433274] ? __ghes_peek_estatus.isra.0+0x49/0xb0
[ 778.433274] ? intel_pmu_handle_irq+0xcb/0x1c0
[ 778.433275] ? perf_event_nmi_handler+0x28/0x50
[ 778.433275] ? nmi_handle+0x58/0x100
[ 778.433276] ? default_do_nmi+0x42/0x140
[ 778.433277] ? exc_nmi+0x122/0x160
[ 778.433277] ? end_repeat_nmi+0x16/0x67
[ 778.433278] ? native_queued_spin_lock_slowpath+0x179/0x1c0
[ 778.433279] ? native_queued_spin_lock_slowpath+0x179/0x1c0
[ 778.433280] ? native_queued_spin_lock_slowpath+0x179/0x1c0
[ 778.433281] </NMI>
[ 778.433281] _raw_spin_lock+0x1e/0x30
[ 778.433282] raw_spin_rq_lock_nested+0xa/0x10
[ 778.433283] update_blocked_averages+0x44/0x120
[ 778.433283] update_nohz_stats+0x40/0x60
[ 778.433284] find_busiest_group+0x287/0xa70
[ 778.433285] load_balance+0x15b/0x6f0
[ 778.433285] newidle_balance+0x154/0x2f0
[ 778.433286] pick_next_task_fair+0x351/0xb10
[ 778.433286] pick_next_task+0x34/0x120
[ 778.433287] __schedule+0x1a1/0x670
[ 778.433287] schedule+0x46/0xb0
[ 778.433288] do_nanosleep+0x71/0x190
[ 778.433288] hrtimer_nanosleep+0x9b/0x140
[ 778.433289] ? hrtimer_init_sleeper+0x80/0x80
[ 778.433290] __se_sys_nanosleep+0xab/0xe0
[ 778.433290] do_syscall_64+0x40/0x80
[ 778.433291] entry_SYSCALL_64_after_hwframe+0x62/0xc7
[ 778.433291] RIP: 0033:0x45e42d
[ 778.433293] Code: 8b 44 24 20 b9 40 42 0f 00 f7 f1 48 89 04 24 b8 e8 03 00 00 f7 e2 48 89 44 24 08 48 89 e7 be 00 00 00 00 b8 23 00 00 00 0f 05 <48> 8b 6c 24 10 48 83 c4 18 c3 cc cc cc cc cc cc cc cc cc b8 ba 00
[ 778.433293] RSP: 002b:000000c00009bf00 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
[ 778.433295] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045e42d
[ 778.433296] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000c00009bf00
[ 778.433296] RBP: 000000c00009bf10 R08: 0000000000000000 R09: 0000000000000000
[ 778.433297] R10: 00007fff129bd080 R11: 0000000000000206 R12: 0000000000431e10
[ 778.433298] R13: 0000000000000011 R14: 000000000123bfc8 R15: 0000000000000000
[ 778.433299] Kernel panic - not syncing: Hard LOCKUP 这个问题咋解决啊?
环境/Environment:
[ 760.046684] BUG: kernel NULL pointer dereference, address: 0000000000000178
https://github.com/yunionio/cloudpods/issues/21306#issuecomment-2401513236
这种内核的问题可以开 kdump 调试一下,看看能不能找到对应的 bugfix,要么换个内核版本试试
@wanyaoqi 你们官方用的是欧拉22.03 sp3 的什么内核版本
@wanyaoqi 你们官方用的是欧拉22.03 sp3 的什么内核版本
5.10.0-182.0.0.95.oe2203sp3.x86_64 #1 SMP Sat Dec 30 13:10:36 CST 2023 x86_64 x86_64 x86_64 GNU/Linux
Oct 18
'24 10:10
zexi
该问题已解决,应该是 欧拉22.03 sp3 对硬件有兼容性的要求,我们的dell 服务器 太老了,可能是硬件上不兼容,换成本地虚拟机搭建可以正常运行,感谢各位老师的答疑。