linux icon indicating copy to clipboard operation
linux copied to clipboard

Some CPUs may be stale, kdump will be unreliable

Open codernavi18 opened this issue 1 year ago • 0 comments

Describe the bug

I have built latest linux-rpi-6.6.y with kernel config bcmrpi3_defconfig and I am launching this kernel on my Raspberry Pi 3B+. I am using buildroot as my rootfs with default configuration for arm64. When I trigger a crash via sysrq, I notice that it gets stuck and crashkernel is not loaded.

# echo c > /proc/sysrq-trigger
[   31.277197] sysrq: Trigger a crash
[   31.280708] Kernel panic - not syncing: sysrq triggered crash
[   31.286540] CPU: 2 PID: 118 Comm: sh Kdump: loaded Not tainted 6.6.57-v8+ #1
[   31.293697] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[   31.299612] Call trace:
[   31.302091]  dump_backtrace+0x9c/0x100
[   31.305907]  show_stack+0x20/0x38
[   31.309274]  dump_stack_lvl+0x48/0x60
[   31.312997]  dump_stack+0x18/0x28
[   31.316365]  panic+0x320/0x388
[   31.319470]  sysrq_handle_crash+0x24/0x30
[   31.323546]  __handle_sysrq+0xec/0x1e8
[   31.327355]  write_sysrq_trigger+0x7c/0xc0
[   31.331518]  proc_reg_write+0xa4/0x100
[   31.335333]  vfs_write+0xd0/0x330
[   31.338700]  ksys_write+0x7c/0x120
[   31.342153]  __arm64_sys_write+0x24/0x38
[   31.346137]  invoke_syscall+0x50/0x120
[   31.349950]  el0_svc_common.constprop.0+0x48/0xf0
[   31.354733]  do_el0_svc+0x24/0x38
[   31.358105]  el0_svc+0x40/0xe8
[   31.361210]  el0t_64_sync_handler+0x120/0x130
[   31.365638]  el0t_64_sync+0x190/0x198
[   31.369368] SMP: stopping secondary CPUs
[   31.373693] Starting crashdump kernel...
[   31.377670] ------------[ cut here ]------------
[   31.382349] Some CPUs may be stale, kdump will be unreliable.
[   31.388181] WARNING: CPU: 2 PID: 118 at arch/arm64/kernel/machine_kexec.c:188 machine_kexec+0x44/0x18
[   31.397638] Modules linked in:
[   31.400739] CPU: 2 PID: 118 Comm: sh Kdump: loaded Not tainted 6.6.57-v8+ #1
[   31.407894] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[   31.413808] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   31.420875] pc : machine_kexec+0x44/0x1d8
[   31.424950] lr : machine_kexec+0x44/0x1d8
[   31.429025] sp : ffffffc08062b9e0
[   31.432384] x29: ffffffc08062b9e0 x28: ffffff80025c1e80 x27: 0000000000000000
[   31.439641] x26: 0000000000000000 x25: ffffffd728927f08 x24: 0000000000000063
[   31.446895] x23: ffffffd7292c5920 x22: ffffffc08062bbc8 x21: ffffffd7292ed000
[   31.454150] x20: ffffff8002cb7400 x19: ffffff8002cb7400 x18: 0000000000000006
[   31.461404] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc08062b57f
[   31.468657] x14: 0000000000000000 x13: 2e656c6261696c65 x12: 726e75206562206c
[   31.475911] x11: ffffffd7290e62c0 x10: 0000000000000000 x9 : ffffffd727d4a094
[   31.483165] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffffffd72913e2c0
[   31.490419] x5 : ffffff80371babc8 x4 : 0000000000000000 x3 : 0000000000000027
[   31.497673] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80025c1e80
[   31.504927] Call trace:
[   31.507404]  machine_kexec+0x44/0x1d8
[   31.511128]  __crash_kexec+0x98/0x180
[   31.514850]  panic+0x290/0x388
[   31.517954]  sysrq_handle_crash+0x24/0x30
[   31.522028]  __handle_sysrq+0xec/0x1e8
[   31.525837]  write_sysrq_trigger+0x7c/0xc0
[   31.529999]  proc_reg_write+0xa4/0x100
[   31.533812]  vfs_write+0xd0/0x330
[   31.537178]  ksys_write+0x7c/0x120
[   31.540632]  __arm64_sys_write+0x24/0x38
[   31.544614]  invoke_syscall+0x50/0x120
[   31.548427]  el0_svc_common.constprop.0+0x48/0xf0
[   31.553210]  do_el0_svc+0x24/0x38
[   31.556581]  el0_svc+0x40/0xe8
[   31.559686]  el0t_64_sync_handler+0x120/0x130
[   31.564114]  el0t_64_sync+0x190/0x198
[   31.567833] ---[ end trace 0000000000000000 ]---
[   31.572516] Bye!

Nothing happens after this. No logs. Its remains stuck here. Why is this happening? Its 100% reproducible on every iteration so seems like a basic issue.

Steps to reproduce the behaviour

Just trigger a kernel crash with echo c > /proc/sysrq-trigger. The crashkernel is never launched as it gets stuck before hand with a warning message.

Device (s)

Raspberry Pi 3 Mod. B+

System

linux-rpi-6.6.y + buildroot

Logs

No response

Additional context

No response

codernavi18 avatar Oct 21 '24 20:10 codernavi18