lkrg icon indicating copy to clipboard operation
lkrg copied to clipboard

Kernel panic occurs when increasing the value of lkrg.kint_validate

Open Ssspade opened this issue 5 months ago • 3 comments

hardware:arm64 kernel version:5.10.209 and 6.6.63

case1:insmod moudle set lkrg.kint_validate to 3, then execute the command sysctl lkrg.kint_validate=2, The system triggers a kernel panic.

[  221.421395][  T821] LKRG: STATE: Changing 'kint_validate' from 3 (PERIODICALLY + RANDOM EVENTS) to 2 (PERIODICALLY)
[  221.431841][  T821] BUG: scheduling while atomic: sysctl/821/0x00000002
 ...
[  221.487800][  T821] Call trace:
[  221.490928][  T821]  dump_backtrace+0x98/0x120
[  221.495365][  T821]  show_stack+0x1c/0x30
[  221.499363][  T821]  dump_stack_lvl+0x44/0x58
[  221.503709][  T821]  dump_stack+0x14/0x20
[  221.507706][  T821]  __schedule_bug+0x58/0x78
[  221.512051][  T821]  __schedule+0x834/0xb40
[  221.516221][  T821]  schedule+0x60/0xc8
[  221.520044][  T821]  schedule_timeout+0x1b0/0x1d0
[  221.524737][  T821]  wait_for_completion+0x7c/0x148
[  221.529601][  T821]  __synchronize_srcu+0x94/0xd8
[  221.534293][  T821]  synchronize_srcu+0xec/0x108
[  221.538898][  T821]  srcu_notifier_chain_unregister+0x4c/0x80
[  221.544631][  T821]  cpufreq_unregister_notifier+0x88/0xd0
[  221.550105][  T821]  p_deregister_notifiers+0x28/0x78 [lkrg]
[  221.555767][  T821]  p_sysctl_kint_validate+0x154/0x188 [lkrg]
[  221.561599][  T821]  proc_sys_call_handler+0x1b8/0x2a0
[  221.566725][  T821]  proc_sys_write+0x18/0x28
[  221.571069][  T821]  vfs_write+0x1c8/0x328
[  221.575155][  T821]  ksys_write+0x78/0x118
[  221.579239][  T821]  __arm64_sys_write+0x20/0x30
[  221.583845][  T821]  invoke_syscall+0x4c/0x110
[  221.588277][  T821]  el0_svc_common.constprop.0+0x44/0xe8
[  221.593664][  T821]  do_el0_svc+0x20/0x30
[  221.597662][  T821]  el0_svc+0x28/0x98
[  221.601399][  T821]  el0t_64_sync_handler+0x118/0x128
[  221.606438][  T821]  el0t_64_sync+0x14c/0x150
[  221.668177][    C6] Unable to handle kernel paging request at virtual address 0000aaaae6abf220
[  221.668183][    C6] Mem abort info:
[  221.668185][    C6]   ESR = 0x0000000092000047
[  221.668187][    C6]   EC = 0x24: DABT (lower EL), IL = 32 bits
[  221.668191][    C6]   SET = 0, FnV = 0
[  221.668194][    C6]   EA = 0, S1PTW = 0
[  221.668196][    C6]   FSC = 0x07: level 3 translation fault
[  221.668199][    C6] Data abort info:
[  221.668200][    C6]   ISV = 0, ISS = 0x00000047, ISS2 = 0x00000000
[  221.668203][    C6]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  221.668206][    C6]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  221.668210][    C6] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002020809000
[  221.668215][    C6] [0000aaaae6abf220] pgd=080000200f3f4003, p4d=080000200f3f4003, pud=0800002004e42003, pmd=0800002005714003, pte=0000000000000000
[  221.668228][  T821] Internal error: Oops: 0000000092000047 [#1] PREEMPT SMP
...
[  221.801770][  T821] pstate: 60001000 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
[  221.809421][  T821] pc : 0000ffffab275790
[  221.813420][  T821] lr : 0000ffffab275774
[  221.817419][  T821] sp : 0000ffffe82ee650
[  221.821417][  T821] x29: 0000ffffe82ee650 x28: 0000ffffab390a50 x27: 0000aaaae6abe200
[  221.829244][  T821] x26: 0000000000000001 x25: 0000000000000058 x24: 0000000000001000
[  221.837069][  T821] x23: 0000ffffab390ab0 x22: 0000ffffab396000 x21: 0000000000001000
[  221.844893][  T821] x20: 0000000000002010 x19: 0000000000001010 x18: 0000000000000003
[  221.852717][  T821] x17: 0000ffffab2765b0 x16: 0000ffffab38fd60 x15: 0000000000000001
[  221.860542][  T821] x14: 0000000000000000 x13: 0000000000000001 x12: 0000ffffe82eeae0
[  221.868365][  T821] x11: 00000000ffffffd0 x10: 000000000000000a x9 : 0000000000000800
[  221.876190][  T821] x8 : 0000000000000000 x7 : 0000000000000003 x6 : fffffffffffffff0
[  221.884014][  T821] x5 : 0000000000000000 x4 : 0000ffffab391160 x3 : 0000aaaae6abe200
[  221.891837][  T821] x2 : 0000ffffab390ab0 x1 : 0000ffffab390ab0 x0 : 0000aaaae6abf210
[  221.899662][  T821] ---[ end trace 0000000000000000 ]---
[  221.943824][  T821] pstore: backend (efi_pstore) writing error (-5)
[  221.950146][  T821] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  221.957528][  T821] SMP: stopping secondary CPUs
[  221.962134][  T821] Kernel Offset: disabled
[  221.966303][  T821] CPU features: 0x0,00000000,50028143,1000700b
[  221.972295][  T821] Memory Limit: none
[  221.985802][  T821] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---

case2:insmod moudle set lkrg.kint_validate to 2, then execute the command sysctl lkrg.kint_validate=3, The kernel reports a warning call trace (but does not panic):

[  587.317307][  T864] LKRG: STATE: Changing 'kint_validate' from 2 (PERIODICALLY) to 3 (PERIODICALLY + RANDOM EVENTS)
[  587.327781][  T864] ------------[ cut here ]------------
[  587.333099][  T864] notifier callback p_freq_transition_notifier [lkrg] already registered
[  587.333156][  T864] WARNING: CPU: 3 PID: 864 at kernel/notifier.c:31 notifier_chain_register+0x60/0x140
...
[  587.403068][  T864] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  587.410717][  T864] pc : notifier_chain_register+0x60/0x140
[  587.416287][  T864] lr : notifier_chain_register+0x60/0x140
[  587.421856][  T864] sp : ffff800083fc3bb0
[  587.425855][  T864] x29: ffff800083fc3bb0 x28: ffff001f819ee740 x27: 0000000000000000
[  587.433682][  T864] x26: ffff800083fc3d40 x25: ffff800083fc3c98 x24: ffff001f80a19a08
[  587.441507][  T864] x23: 0000000000000002 x22: ffff80007ae0c000 x21: ffff800082ab0020
[  587.449331][  T864] x20: ffff80007ae0c8d8 x19: ffff800082922d90 x18: ffffffffffffffff
[  587.457156][  T864] x17: 33206f742029594c x16: 4c414349444f4952 x15: ffff800083fc365d
[  587.464979][  T864] x14: 0000000000000001 x13: 6465726574736967 x12: 6572207964616572
[  587.472803][  T864] x11: ffff800082723648 x10: 0000000000000000 x9 : ffff8000801164b4
[  587.480628][  T864] x8 : 0000000000017fe8 x7 : 00000000fffff2da x6 : ffff80008277b648
[  587.488453][  T864] x5 : ffff0022fdf5ea48 x4 : 40000000fffff2da x3 : ffff80227b93a000
[  587.496277][  T864] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff001f819ee740
[  587.504101][  T864] Call trace:
[  587.507232][  T864]  notifier_chain_register+0x60/0x140
[  587.512455][  T864]  srcu_notifier_chain_register+0x3c/0x80
[  587.518020][  T864]  cpufreq_register_notifier+0xa0/0xf0
[  587.523327][  T864]  p_register_notifiers+0x28/0x78 [lkrg]
[  587.528841][  T864]  p_sysctl_kint_validate+0x138/0x188 [lkrg]
[  587.534701][  T864]  proc_sys_call_handler+0x1b8/0x2a0
[  587.539832][  T864]  proc_sys_write+0x18/0x28
[  587.544182][  T864]  vfs_write+0x1c8/0x328
[  587.548276][  T864]  ksys_write+0x78/0x118
[  587.552368][  T864]  __arm64_sys_write+0x20/0x30
[  587.556982][  T864]  invoke_syscall+0x4c/0x110
[  587.561422][  T864]  el0_svc_common.constprop.0+0x44/0xe8
[  587.566817][  T864]  do_el0_svc+0x20/0x30
[  587.570821][  T864]  el0_svc+0x28/0x98
[  587.574566][  T864]  el0t_64_sync_handler+0x118/0x128
[  587.579612][  T864]  el0t_64_sync+0x14c/0x150
[  587.583961][  T864] ---[ end trace 0000000000000000 ]---
[  587.589283][  T864] ------------[ cut here ]------------

Ssspade avatar Nov 11 '25 08:11 Ssspade

I couldn't reproduce this, not even by running:

while :; do sysctl lkrg.kint_validate=3; sysctl lkrg.kint_validate=2; done

In dmesg, I got merely this:

[  165.001964] LKRG: STATE: Changing 'kint_validate' from 2 (PERIODICALLY) to 3 (PERIODICALLY + RANDOM EVENTS)
[  165.003049] LKRG: STATE: Changing 'kint_validate' from 3 (PERIODICALLY + RANDOM EVENTS) to 2 (PERIODICALLY)
[  165.021000] LKRG: STATE: Changing 'kint_validate' from 2 (PERIODICALLY) to 3 (PERIODICALLY + RANDOM EVENTS)
[  165.022047] LKRG: STATE: Changing 'kint_validate' from 3 (PERIODICALLY + RANDOM EVENTS) to 2 (PERIODICALLY)

and so on. So it did rapidly change this setting many times with no ill effects. But this isn't on the exact kernels you have. I'll try on more systems.

I also didn't get this line:

[  221.431841][  T821] BUG: scheduling while atomic: sysctl/821/0x00000002

but it is a known issue, described in comments in #204. We should fix it. I don't know if you triggering this issue (which apparently I do not) maybe contributed to the further issues you saw.

solardiz avatar Nov 11 '25 19:11 solardiz

hardware:arm64 kernel version:5.10.209 and 6.6.63

What distro are you on? Are these distro kernels?

solardiz avatar Nov 11 '25 19:11 solardiz

hardware:arm64 kernel version:5.10.209 and 6.6.63

What distro are you on? Are these distro kernels?

Thanks for your reply! The kernel I'm using may not be a standard distro kernel, as it was compiled by me.

Ssspade avatar Nov 12 '25 06:11 Ssspade