skiboot
skiboot copied to clipboard
fast-reset: NMI watchdog: BUG: soft lockup - CPU#113 stuck for 22s! [kopald:854]
(bringing over from internal bugzilla)
Pridhiviraj Paidipeddi 2016-12-24 00:15:00 AEDT:
Install a 8284-22A PowerNV system with FW860.20 latest firmware build, and install a Ubuntu 16.10. Then below test steps caused the NMI Watchdog: Bug soft lockup issue in petitboot kernel.
Test-steps:
-
Boot the system to runtime(i,e into petitboot)
-
Enable the fast-reboot by using below command / # nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky / # nvram --print-config=experimental-fast-reset -p ibm,skiboot feeling-lucky
-
Then do a FSP ResetReload test smgr resetReload
-
Once FSP comes UP, do reboot in petitboot.
-
After system reaching to peitiboot, check dmesg for softlockup's
[ 32.107804] NMI watchdog: BUG: soft lockup - CPU#113 stuck for 22s! [kopald:854]
[ 32.107894] Modules linked in: sr_mod sd_mod cdrom lpfc ipr crc_t10dif crct10dif_generic crct10dif_common
[ 32.107903] CPU: 113 PID: 854 Comm: kopald Not tainted 4.4.27-openpower1 #1
[ 32.107905] task: c000002fc61a5ef0 ti: c000002fc621c000 task.ti: c000002fc621c000
[ 32.107907] NIP: c00000000000d8dc LR: c00000000000d8dc CTR: c000000000015d00
[ 32.107908] REGS: c000002fc621f630 TRAP: 0901 Not tainted (4.4.27-openpower1)
[ 32.107909] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42000024 XER: 00000000
[ 32.107915] CFAR: c0000000000a5448 SOFTE: 1
[ 32.107915] GPR00: c000000000059534 c000002fc621f8b0 c00000000089a500 0000000000000900
[ 32.107915] GPR04: 0000000000000000 0000000022b2980d 0000000004898d98 0000000000000001
[ 32.107915] GPR08: 0000000000000000 0000000000000002 000000e371050eb6 0000000000000003
[ 32.107915] GPR12: c000000000038d10 c00000000fe95300
[ 32.107925] NIP [c00000000000d8dc] arch_local_irq_restore+0x5c/0x80
[ 32.107927] LR [c00000000000d8dc] arch_local_irq_restore+0x5c/0x80
[ 32.107928] Call Trace:
[ 32.107929] [c000002fc621f8b0] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 32.107932] [c000002fc621f8d0] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 32.107934] [c000002fc621f9c0] [c0000000000599a0] irq_exit+0x74/0xcc
[ 32.107935] [c000002fc621f9e0] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 32.107937] [c000002fc621fa10] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 32.107941] --- interrupt: 901 at opal_handle_events+0xb0/0xb8
[ 32.107941] LR = opal_handle_events+0x5c/0xb8
[ 32.107943] [c000002fc621fd50] [c00000000003b6ac] kopald+0x2c/0x64
[ 32.107945] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 32.107947] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
[ 32.107948] Instruction dump:
[ 32.107949] f821ffe1 419e000c e92d0020 7d210164 39200000 992d0252 4bffff11 39400001
[ 32.107952] 994d0252 2fa30000 419e0010 4bff4591 <60000000> 48000010 e92d0020 61298000
[ 34.138212] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: [email protected]
[ 34.187971] tg3 0001:03:00.3 enP1p3s0f3: renamed from eth3
[ 34.220791] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
[ 34.260346] EXT4-fs (dm-5): INFO: recovery required on readonly filesystem
[ 34.260351] EXT4-fs (dm-5): write access will be enabled during recovery
[ 34.272708] EXT4-fs (dm-5): recovery complete
[ 34.272743] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
[ 36.479095] rport-3:0-4: blocked FC remote port time out: removing rport
[ 55.007967] tg3 0003:05:00.1 enP3p5s0f1: renamed from eth5
[ 78.017790] INFO: rcu_sched self-detected stall on CPU
[ 78.017795] 100-...: (1 GPs behind) idle=167/140000000000002/0 softirq=176/176 fqs=2082
[ 78.017796] (t=2100 jiffies g=-220 c=-221 q=97)
[ 78.017817] Task dump for CPU 100:
[ 78.017818] kopald R running task 0 854 2 0x00000804
[ 78.017821] Call Trace:
[ 78.017826] [c000002fc621f1e0] [c00000000007c6f0] sched_show_task+0x10c/0x118 (unreliable)
[ 78.017830] [c000002fc621f250] [c00000000009eda8] rcu_dump_cpu_stacks+0xa0/0xd0
[ 78.017832] [c000002fc621f2a0] [c0000000000a2428] rcu_check_callbacks+0x294/0x734
[ 78.017834] [c000002fc621f3c0] [c0000000000a44e0] update_process_times+0x3c/0x74
[ 78.017837] [c000002fc621f3f0] [c0000000000b2c74] tick_sched_handle.isra.14+0x60/0x6c
[ 78.017840] [c000002fc621f420] [c0000000000b2ccc] tick_sched_timer+0x4c/0x84
[ 78.017841] [c000002fc621f460] [c0000000000a4d84] __hrtimer_run_queues+0xd0/0x188
[ 78.017843] [c000002fc621f4e0] [c0000000000a53f0] hrtimer_interrupt+0xb0/0x1fc
[ 78.017845] [c000002fc621f5a0] [c0000000000162c0] __timer_interrupt+0x7c/0xe0
[ 78.017847] [c000002fc621f5e0] [c000000000016468] timer_interrupt+0xac/0xc4
[ 78.017849] [c000002fc621f610] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 78.017852] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 78.017852] LR = arch_local_irq_restore+0x5c/0x80
[ 78.017854] [c000002fc621f900] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 78.017857] [c000002fc621f920] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 78.017859] [c000002fc621fa10] [c0000000000599a0] irq_exit+0x74/0xcc
[ 78.017860] [c000002fc621fa30] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 78.017862] [c000002fc621fa60] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 78.017866] --- interrupt: 901 at opal_handle_events+0xb4/0xb8
[ 78.017866] LR = opal_handle_events+0x5c/0xb8
[ 78.017868] [c000002fc621fd50] [c00000000003b6ac] kopald+0x2c/0x64 (unreliable)
[ 78.017871] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 78.017873] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
[ 80.047889] tg3 0003:05:00.2 enP3p5s0f2: renamed from eth6
[ 103.057786] INFO: rcu_sched self-detected stall on CPU
[ 103.057792] 101-...: (1 GPs behind) idle=a0b/140000000000002/0 softirq=288/289 fqs=2099
[ 103.057795] (t=2100 jiffies g=-218 c=-219 q=72)
[ 103.057823] Task dump for CPU 101:
[ 103.057825] kopald R running task 0 854 2 0x00000804
[ 103.057829] Call Trace:
[ 103.057834] [c000002fc621f1e0] [c00000000007c6f0] sched_show_task+0x10c/0x118 (unreliable)
[ 103.057839] [c000002fc621f250] [c00000000009eda8] rcu_dump_cpu_stacks+0xa0/0xd0
[ 103.057843] [c000002fc621f2a0] [c0000000000a2428] rcu_check_callbacks+0x294/0x734
[ 103.057847] [c000002fc621f3c0] [c0000000000a44e0] update_process_times+0x3c/0x74
[ 103.057851] [c000002fc621f3f0] [c0000000000b2c74] tick_sched_handle.isra.14+0x60/0x6c
[ 103.057855] [c000002fc621f420] [c0000000000b2ccc] tick_sched_timer+0x4c/0x84
[ 103.057858] [c000002fc621f460] [c0000000000a4d84] __hrtimer_run_queues+0xd0/0x188
[ 103.057861] [c000002fc621f4e0] [c0000000000a53f0] hrtimer_interrupt+0xb0/0x1fc
[ 103.057865] [c000002fc621f5a0] [c0000000000162c0] __timer_interrupt+0x7c/0xe0
[ 103.057868] [c000002fc621f5e0] [c000000000016468] timer_interrupt+0xac/0xc4
[ 103.057872] [c000002fc621f610] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 103.057879] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 103.057879] LR = arch_local_irq_restore+0x5c/0x80
[ 103.057883] [c000002fc621f900] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 103.057888] [c000002fc621f920] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 103.057891] [c000002fc621fa10] [c0000000000599a0] irq_exit+0x74/0xcc
[ 103.057895] [c000002fc621fa30] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 103.057898] [c000002fc621fa60] [c00000000000995c] restore_check_irq_replay+0x54/0terrupt+0xac/0xc4
[ 203.267824] [c000002fc621f520] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 203.267827] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 203.267827] LR = arch_local_irq_restore+0x5c/0x80
[ 203.267830] [c000002fc621f810] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 203.267832] [c000002fc621f830] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 203.267834] [c000002fc621f920] [c0000000000599a0] irq_exit+0x74/0xcc
[ 203.267835] [c000002fc621f940] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 203.267837] [c000002fc621f970] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 203.267841] --- interrupt: 901 at schedule_timeout+0x84/0x1bc
[ 203.267841] LR = schedule_timeout+0x1c/0x1bc
[ 203.267844] [c000002fc621fc60] [c0000000005a58bc] schedule_timeout+0x1a0/0x1bc (unreliable)
[ 203.267846] [c000002fc621fd30] [c0000000000a3fdc] msleep_interruptible+0x40/0x5c
[ 203.267848] [c000002fc621fd50] [c00000000003b6bc] kopald+0x3c/0x64
[ 203.267850] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 203.267851] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
[ 226.327769] INFO: rcu_sched self-detected stall on CPU
[ 226.327773] 49-...: (2098 ticks this GP) idle=741/140000000000002/0 softirq=3144/3144 fqs=2082
[ 226.327774] (t=2100 jiffies g=-200 c=-201 q=281)
[ 226.327796] Task dump for CPU 49:
[ 226.327798] kopald S 0000000000000000 0 854 2 0x00000804
[ 226.327800] Call Trace:
[ 226.327806] [c000002fc621f0a0] [c00000000007c6f0] sched_show_task+0x10c/0x118 (unreliable)
[ 226.327810] [c000002fc621f110] [c00000000009eda8] rcu_dump_cpu_stacks+0xa0/0xd0
[ 226.327812] [c000002fc621f160] [c0000000000a2428] rcu_check_callbacks+0x294/0x734
[ 226.327814] [c000002fc621f280] [c0000000000a44e0] update_process_times+0x3c/0x74
[ 226.327816] [c000002fc621f2b0] [c0000000000b2c74] tick_sched_handle.isra.14+0x60/0x6c
[ 226.327818] [c000002fc621f2e0] [c0000000000b2ccc] tick_sched_timer+0x4c/0x84
[ 226.327820] [c000002fc621f320] [c0000000000a4d84] __hrtimer_run_queues+0xd0/0x188
[ 226.327822] [c000002fc621f3a0] [c0000000000a53f0] hrtimer_interrupt+0xb0/0x1fc
[ 226.327824] [c000002fc621f460] [c0000000000162c0] __timer_interrupt+0x7c/0xe0
[ 226.327826] [c000002fc621f4a0] [c000000000016468] timer_interrupt+0xac/0xc4
[ 226.327828] [c000002fc621f4d0] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 226.327832] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 226.327832] LR = arch_local_irq_restore+0x5c/0x80
[ 226.327834] [c000002fc621f7c0] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 226.327837] [c000002fc621f7e0] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 226.327839] [c000002fc621f8d0] [c0000000000599a0] irq_exit+0x74/0xcc
[ 226.327840] [c000002fc621f8f0] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 226.327842] [c000002fc621f920] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 226.327845] --- interrupt: 901 at lock_timer_base.isra.22+0x30/0x84
[ 226.327845] LR = lock_timer_base.isra.22+0x10/0x84
[ 226.327848] [c000002fc621fc10] [00000000ffffd2cc] 0xffffd2cc (unreliable)
[ 226.327851] [c000002fc621fc60] [c0000000005a57d0] schedule_timeout+0xb4/0x1bc
[ 226.327853] [c000002fc621fd30] [c0000000000a3fdc] msleep_interruptible+0x40/0x5c
[ 226.327856] [c000002fc621fd50] [c00000000003b6bc] kopald+0x3c/0x64
[ 226.327858] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 226.327860] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
[ 253.357774] INFO: rcu_sched self-detected stall on CPU
[ 253.357780] 52-...: (2 GPs behind) idle=893/140000000000002/0 softirq=3070/3070 fqs=2085
[ 253.357782] (t=2100 jiffies g=-196 c=-197 q=168)
[ 253.357808] Task dump for CPU 52:
[ 253.357810] kopald S 0000000000000000 0 854 2 0x00000804
[ 253.357814] Call Trace:
[ 253.357819] [c000002fc621f0a0] [c00000000007c6f0] sched_show_task+0x10c/0x118 (unreliable)
[ 253.357823] [c000002fc621f110] [c00000000009eda8] rcu_dump_cpu_stacks+0xa0/0xd0
[ 253.357828] [c000002fc621f160] [c0000000000a2428] rcu_check_callbacks+0x294/0x734
[ 253.357832] [c000002fc621f280] [c0000000000a44e0] update_process_times+0x3c/0x74
[ 253.357836] [c000002fc621f2b0] [c0000000000b2c74] tick_sched_handle.isra.14+0x60/0x6c
[ 253.357839] [c000002fc621f2e0] [c0000000000b2ccc] tick_sched_timer+0x4c/0x84
[ 253.357842] [c000002fc621f320] [c0000000000a4d84] __hrtimer_run_queues+0xd0/0x188
[ 253.357846] [c000002fc621f3a0] [c0000000000a53f0] hrtimer_interrupt+0xb0/0x1fc
[ 253.357849] [c000002fc621f460] [c0000000000162c0] __timer_interrupt+0x7c/0xe0
[ 253.357852] [c000002fc621f4a0] [c000000000016468] timer_interrupt+0xac/0xc4
[ 253.357856] [c000002fc621f4d0] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 253.357863] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 253.357863] LR = arch_local_irq_restore+0x5c/0x80
[ 253.357867] [c000002fc621f7c0] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 253.357872] [c000002fc621f7e0] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 253.357875] [c000002fc621f8d0] [c0000000000599a0] irq_exit+0x74/0xcc
[ 253.357879] [c000002fc621f8f0] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 253.357883] [c000002fc621f920] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 253.357888] --- interrupt: 901 at _savegpr0_29+0x0/0x4
[ 253.357888] LR = _raw_spin_lock_irqsave+0x10/0x90
[ 253.357892] [c000002fc621fc10] [00000000ffffe593] 0xffffe593 (unreliable)
[ 253.357896] [c000002fc621fc60] [c0000000005a57d0] schedule_timeout+0xb4/0x1bc
[ 253.357901] [c000002fc621fd30] [c0000000000a3fdc] msleep_interruptible+0x40/0x5c
[ 253.357904] [c000002fc621fd50] [c00000000003b6bc] kopald+0x3c/0x64
[ 253.357908] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 253.357912] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
[ 277.167770] INFO: rcu_sched self-detected stall on CPU
[ 277.167775] 53-...: (1 GPs behind) idle=7af/140000000000002/0 softirq=4500/4501 fqs=1832
[ 277.167776] (t=2100 jiffies g=-194 c=-195 q=452)
[ 277.167802] Task dump for CPU 53:
[ 277.167804] kopald R running task 0 854 2 0x00000804
[ 277.167807] Call Trace:
[ 277.167810] [c000002fc621f190] [c00000000007c6f0] sched_show_task+0x10c/0x118 (unreliable)
[ 277.167814] [c000002fc621f200] [c00000000009eda8] rcu_dump_cpu_stacks+0xa0/0xd0
[ 277.167817] [c000002fc621f250] [c0000000000a2428] rcu_check_callbacks+0x294/0x734
[ 277.167820] [c000002fc621f370] [c0000000000a44e0] update_process_times+0x3c/0x74
[ 277.167822] [c000002fc621f3a0] [c0000000000b2c74] tick_sched_handle.isra.14+0x60/0x6c
[ 277.167825] [c000002fc621f3d0] [c0000000000b2ccc] tick_sched_timer+0x4c/0x84
[ 277.167827] [c000002fc621f410] [c0000000000a4d84] __hrtimer_run_queues+0xd0/0x188
[ 277.167830] [c000002fc621f490] [c0000000000a53f0] hrtimer_interrupt+0xb0/0x1fc
[ 277.167833] [c000002fc621f550] [c0000000000162c0] __timer_interrupt+0x7c/0xe0
[ 277.167835] [c000002fc621f590] [c000000000016468] timer_interrupt+0xac/0xc4
[ 277.167838] [c000002fc621f5c0] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 277.167842] --- interrupt: 901 at arch_local_irq_restore+0x5c/0x80
[ 277.167842] LR = arch_local_irq_restore+0x5c/0x80
[ 277.167846] [c000002fc621f8b0] [c00000000007e7e0] vtime_common_account_irq_enter+0x48/0x5c (unreliable)
[ 277.167849] [c000002fc621f8d0] [c000000000059534] __do_softirq+0xe0/0x2b4
[ 277.167852] [c000002fc621f9c0] [c0000000000599a0] irq_exit+0x74/0xcc
[ 277.167854] [c000002fc621f9e0] [c00000000001646c] timer_interrupt+0xb0/0xc4
[ 277.167857] [c000002fc621fa10] [c00000000000995c] restore_check_irq_replay+0x54/0x70
[ 277.167860] --- interrupt: 901 at irq_work_queue+0xa0/0xa4
[ 277.167860] LR = irq_work_queue+0x78/0xa4
[ 277.167864] [c000002fc621fd00] [c00000000003f528] opal_handle_events+0x5c/0xb8 (unreliable)
[ 277.167867] [c000002fc621fd50] [c00000000003b6ac] kopald+0x2c/0x64
[ 277.167870] [c000002fc621fd80] [c0000000000717a0] kthread+0xfc/0x108
[ 277.167873] [c000002fc621fe30] [c0000000000094b4] ret_from_kernel_thread+0x5c/0xa8
/ #
/ # update_flash -d
Current firwmare version :
P side : FW860.12 (SV860_066)
T side : FW860.20 (SV860_067)
Boot side : FW860.20 (SV860_067)
/ #
/ # cat /sys/firmware/opal/msglog | grep -i skiboot
[ 81.163066898,5] SkiBoot f38ed26 starting...
/ #81
Interesting. Something similar to other bug we saw internally. kopald makes opal call to get outstanding events and then calls our handler in kernel.
-Vasant
Can you please retry with upstream kernel?
-Vasant
I proposed a patch to process opal events directly by kopald, and to enable and disable interrupts and also cond_resched between processing each event. It might improve this situation.
https://marc.info/?l=linuxppc-embedded&m=150902327030882&w=2
@hegdevasant Reproduced with both official build and upstream kernel.
<4>[ 0.000000] Could not find start_pfn for node 17
<4>[ 1.918993] This architecture does not have kernel memory protection.
<4>[ 1.997483] Emulex LightPulse Fibre Channel SCSI driver 11.4.0.1
<4>[ 1.997487] Copyright (C) 2017 Broadcom. All Rights Reserved. The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
<3>[ 2.001923] EEH: Frozen PE#fd on PHB#1 detected
<3>[ 2.001929] EEH: PE location: U78CB.001.WZS0030-P1-C14, PHB location: N/A
<4>[ 2.002752] EEH: This PCI device has failed 1 times in the last hour
<4>[ 2.002793] EEH: of node=0001:04:00.0
<4>[ 2.002796] EEH: PCI device/vendor: 034a1014
<4>[ 2.002799] EEH: PCI cmd/status register: 00100142
<4>[ 2.002800] EEH: PCI-E capabilities and status follow:
<4>[ 2.002853] EEH: PCI-E 00: 0002a410 10008fc2 0000283e 0003f503
<4>[ 2.002862] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
<4>[ 2.002863] EEH: PCI-E 20: 00000000
<4>[ 2.002864] EEH: PCI-E AER capability register set follows:
<4>[ 2.002875] EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
<4>[ 2.002885] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000
<4>[ 2.002894] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
<4>[ 2.002897] EEH: PCI-E AER 30: 00000000 00000000
<3>[ 2.211328] ELOG: OPAL log info read failed
<3>[ 2.348038] lpfc 0004:01:00.0: 0:6101 Disabling NVME support: Not supported by firmware: 0 0
<3>[ 2.348043] lpfc 0004:01:00.0: 0:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 0
<3>[ 4.578036] lpfc 0004:01:00.1: 1:6101 Disabling NVME support: Not supported by firmware: 0 0
<3>[ 4.578041] lpfc 0004:01:00.1: 1:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 0
<3>[ 10.254238] lpfc 0005:0b:00.0: 2:1303 Link Up Event x1 received Data: x1 x1 x10 x0 x0 x0 0
<3>[ 13.159540] lpfc 0005:0b:00.1: 3:1303 Link Up Event x1 received Data: x1 x1 x10 x0 x0 x0 0
<3>[ 26.918024] INFO: rcu_sched self-detected stall on CPU
<3>[ 26.918033] 35-...: (2 GPs behind) idle=1a2/140000000000002/0 softirq=56/58 fqs=900
<3>[ 26.918033] (t=2100 jiffies g=-265 c=-266 q=730)
<1>[ 26.918048] Call Trace:
<1>[ 26.918053] [c000000ff4a0b1c0] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 26.918057] [c000000ff4a0b230] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 26.918060] [c000000ff4a0b280] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 26.918062] [c000000ff4a0b3b0] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 26.918065] [c000000ff4a0b3e0] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 26.918068] [c000000ff4a0b400] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 26.918069] [c000000ff4a0b440] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 26.918071] [c000000ff4a0b4c0] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 26.918074] [c000000ff4a0b580] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 26.918076] [c000000ff4a0b5c0] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 26.918079] [c000000ff4a0b5f0] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 26.918082] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 26.918082] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 26.918084] [c000000ff4a0b8e0] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 26.918086] [c000000ff4a0b900] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 26.918089] [c000000ff4a0b9f0] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 26.918091] [c000000ff4a0ba10] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 26.918093] [c000000ff4a0ba40] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 26.918097] --- interrupt: 901 at _restgpr0_31+0x0/0xba38
<1>[ 26.918097] LR = irq_work_queue+0x74/0xa4
<1>[ 26.918099] [c000000ff4a0bd30] [c00000000004b204] opal_handle_events+0x54/0xa8 (unreliable)
<1>[ 26.918101] [c000000ff4a0bd80] [c00000000004755c] kopald+0x44/0x70
<1>[ 26.918103] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 26.918106] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<1>[ 90.728029] .
<1>[ 90.728073] ready
<1>[ 90.808024] .
<1>[ 90.808025] .
<1>[ 90.808027] .
<1>[ 90.808029] .
<1>[ 90.808105] ready
<1>[ 90.808110] ready
<1>[ 90.808117] ready
<1>[ 90.808128] ready
<1>[ 90.888025] .
<1>[ 90.888025] .
<1>[ 90.888026] .
<1>[ 90.888034] .
<1>[ 90.888102] ready
<1>[ 90.888108] ready
<1>[ 90.888119] ready
<1>[ 90.888189] ready
<3>[ 90.928141] sd 4:2:0:0: [sdg] Assuming drive cache: write through
<1>[ 90.968025] .
<1>[ 90.968039] .
<1>[ 90.968099] ready
<1>[ 90.968149] ready
<3>[ 91.008129] sd 4:2:1:0: [sdh] Assuming drive cache: write through
<3>[ 91.008138] sd 1:2:0:0: [sda] Assuming drive cache: write through
<3>[ 91.008145] sd 4:2:2:0: [sdi] Assuming drive cache: write through
<3>[ 91.008219] sd 1:2:1:0: [sdb] Assuming drive cache: write through
<3>[ 91.088145] sd 4:2:4:0: [sdk] Assuming drive cache: write through
<3>[ 91.088152] sd 4:2:3:0: [sdj] Assuming drive cache: write through
<3>[ 91.088156] sd 1:2:2:0: [sdc] Assuming drive cache: write through
<3>[ 91.088160] sd 1:2:3:0: [sdd] Assuming drive cache: write through
<3>[ 91.178159] sd 1:2:4:0: [sde] Assuming drive cache: write through
<3>[ 91.178162] sd 1:2:5:0: [sdf] Assuming drive cache: write through
<3>[ 93.802071] be2net 0044:01:00.0: opcode 36-1 failed:status 68-0
<3>[ 98.362057] be2net 0044:01:00.1: opcode 36-1 failed:status 68-0
<3>[ 98.828028] lpfc 0044:01:00.4: 4:6101 Disabling NVME support: Not supported by firmware: 0 0
<3>[ 98.828033] lpfc 0044:01:00.4: 4:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 0
<3>[ 101.698029] lpfc 0044:01:00.5: 5:6101 Disabling NVME support: Not supported by firmware: 0 0
<3>[ 101.698034] lpfc 0044:01:00.5: 5:2574 IO channels: irqs 4 fcp 4 nvme 0 MRQ: 0
<3>[ 103.838062] print_req_error: I/O error, dev sdl, sector 0
<3>[ 103.838065] Buffer I/O error on dev sdl, logical block 0, async page read
<3>[ 116.318552] print_req_error: I/O error, dev sdl, sector 0
<3>[ 116.318555] Buffer I/O error on dev sdl, logical block 0, async page read
<3>[ 118.398071] print_req_error: I/O error, dev sdl, sector 0
<3>[ 118.914736] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 119.431405] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 119.948084] print_req_error: I/O error, dev sdl, sector 0
<3>[ 120.464772] print_req_error: I/O error, dev sdl, sector 0
<3>[ 120.981446] print_req_error: I/O error, dev sdl, sector 0
<3>[ 121.533444] print_req_error: I/O error, dev sdl, sector 0
<3>[ 122.114796] print_req_error: I/O error, dev sdl, sector 0
<3>[ 122.631448] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 123.138841] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 123.648150] print_req_error: I/O error, dev sdl, sector 0
<3>[ 124.164808] print_req_error: I/O error, dev sdl, sector 0
<3>[ 124.681491] print_req_error: I/O error, dev sdl, sector 0
<3>[ 125.231501] print_req_error: I/O error, dev sdl, sector 0
<3>[ 125.814890] print_req_error: I/O error, dev sdl, sector 0
<3>[ 126.331732] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 126.848208] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 127.364870] print_req_error: I/O error, dev sdl, sector 0
<3>[ 127.881553] print_req_error: I/O error, dev sdl, sector 0
<3>[ 128.398233] print_req_error: I/O error, dev sdl, sector 0
<3>[ 128.948240] print_req_error: I/O error, dev sdl, sector 0
<3>[ 129.531646] print_req_error: I/O error, dev sdl, sector 0
<3>[ 130.048262] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 130.555625] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 131.064950] print_req_error: I/O error, dev sdl, sector 0
<3>[ 131.581625] print_req_error: I/O error, dev sdl, sector 0
<3>[ 132.098328] print_req_error: I/O error, dev sdl, sector 0
<3>[ 132.648309] print_req_error: I/O error, dev sdl, sector 0
<3>[ 133.231650] print_req_error: I/O error, dev sdl, sector 0
<3>[ 133.748325] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 134.265002] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 134.781690] print_req_error: I/O error, dev sdl, sector 0
<3>[ 135.298348] print_req_error: I/O error, dev sdl, sector 0
<3>[ 135.815028] print_req_error: I/O error, dev sdl, sector 0
<3>[ 136.365049] print_req_error: I/O error, dev sdl, sector 0
<3>[ 137.015082] print_req_error: I/O error, dev sdl, sector 0
<3>[ 137.531789] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 138.048408] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 138.565080] print_req_error: I/O error, dev sdl, sector 0
<3>[ 139.081755] print_req_error: I/O error, dev sdl, sector 0
<3>[ 139.418023] INFO: rcu_sched self-detected stall on CPU
<3>[ 139.418034] 36-...: (1 GPs behind) idle=db6/140000000000002/0 softirq=92/92 fqs=1049
<3>[ 139.418036] (t=2100 jiffies g=-238 c=-239 q=8292)
<1>[ 139.418056] Call Trace:
<1>[ 139.418061] [c000000ff4a0b210] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 139.418066] [c000000ff4a0b280] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 139.418069] [c000000ff4a0b2d0] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 139.418071] [c000000ff4a0b400] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 139.418075] [c000000ff4a0b430] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 139.418077] [c000000ff4a0b450] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 139.418080] [c000000ff4a0b490] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 139.418081] [c000000ff4a0b510] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 139.418084] [c000000ff4a0b5d0] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 139.418086] [c000000ff4a0b610] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 139.418089] [c000000ff4a0b640] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 139.418092] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 139.418092] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 139.418094] [c000000ff4a0b930] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 139.418097] [c000000ff4a0b950] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 139.418099] [c000000ff4a0ba40] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 139.418101] [c000000ff4a0ba60] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 139.418103] [c000000ff4a0ba90] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 139.418107] --- interrupt: 901 at schedule_timeout_interruptible+0x10/0x18
<1>[ 139.418107] LR = kopald+0x50/0x70
<1>[ 139.418109] [c000000ff4a0bd80] [c00000000004755c] kopald+0x44/0x70 (unreliable)
<1>[ 139.418111] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 139.418113] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<3>[ 139.598447] print_req_error: I/O error, dev sdl, sector 0
<3>[ 140.148456] print_req_error: I/O error, dev sdl, sector 0
<3>[ 149.365309] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 149.881955] print_req_error: I/O error, dev sdl, sector 41942912
<3>[ 149.881957] Buffer I/O error on dev sdl, logical block 327679, async page read
<3>[ 174.032430] print_req_error: I/O error, dev sdn, sector 0
<3>[ 174.032433] Buffer I/O error on dev sdn, logical block 0, async page read
<3>[ 178.558233] print_req_error: I/O error, dev sdn, sector 0
<3>[ 178.558235] Buffer I/O error on dev sdn, logical block 0, async page read
<3>[ 183.038356] print_req_error: I/O error, dev sdn, sector 41942912
<3>[ 191.358282] print_req_error: I/O error, dev sdn, sector 41942912
<3>[ 191.358284] Buffer I/O error on dev sdn, logical block 327679, async page read
<3>[ 212.368021] INFO: rcu_sched self-detected stall on CPU
<3>[ 212.368026] 38-...: (1 GPs behind) idle=79a/140000000000002/0 softirq=797/797 fqs=989
<3>[ 212.368026] (t=2100 jiffies g=-220 c=-221 q=949)
<1>[ 212.368038] Call Trace:
<1>[ 212.368042] [c000000ff4a0b0e0] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 212.368046] [c000000ff4a0b150] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 212.368048] [c000000ff4a0b1a0] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 212.368050] [c000000ff4a0b2d0] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 212.368052] [c000000ff4a0b300] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 212.368054] [c000000ff4a0b320] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 212.368056] [c000000ff4a0b360] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 212.368058] [c000000ff4a0b3e0] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 212.368061] [c000000ff4a0b4a0] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 212.368063] [c000000ff4a0b4e0] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 212.368065] [c000000ff4a0b510] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 212.368068] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 212.368068] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 212.368070] [c000000ff4a0b800] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 212.368073] [c000000ff4a0b820] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 212.368075] [c000000ff4a0b910] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 212.368077] [c000000ff4a0b930] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 212.368079] [c000000ff4a0b960] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 212.368082] --- interrupt: 901 at lock_timer_base+0x44/0xa4
<1>[ 212.368082] LR = lock_timer_base+0x10/0xa4
<1>[ 212.368083] [c000000ff4a0bc50] [c000000000879a80] 0xc000000000879a80 (unreliable)
<1>[ 212.368086] [c000000ff4a0bca0] [c0000000005f33b8] schedule_timeout+0xc0/0x278
<1>[ 212.368088] [c000000ff4a0bd80] [c000000000047568] kopald+0x50/0x70
<1>[ 212.368090] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 212.368092] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<3>[ 233.608027] INFO: rcu_sched self-detected stall on CPU
<3>[ 233.608031] 38-...: (1 GPs behind) idle=7a6/140000000000002/0 softirq=798/799 fqs=840
<3>[ 233.608032] (t=2100 jiffies g=-219 c=-220 q=998)
<1>[ 233.608041] Call Trace:
<1>[ 233.608044] [c000000ff4a0b210] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 233.608047] [c000000ff4a0b280] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 233.608049] [c000000ff4a0b2d0] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 233.608051] [c000000ff4a0b400] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 233.608053] [c000000ff4a0b430] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 233.608055] [c000000ff4a0b450] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 233.608057] [c000000ff4a0b490] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 233.608059] [c000000ff4a0b510] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 233.608061] [c000000ff4a0b5d0] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 233.608063] [c000000ff4a0b610] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 233.608065] [c000000ff4a0b640] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 233.608068] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 233.608068] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 233.608070] [c000000ff4a0b930] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 233.608073] [c000000ff4a0b950] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 233.608075] [c000000ff4a0ba40] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 233.608077] [c000000ff4a0ba60] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 233.608079] [c000000ff4a0ba90] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 233.608082] --- interrupt: 901 at schedule_timeout+0x1c/0x278
<1>[ 233.608082] LR = schedule_timeout+0x1c/0x278
<1>[ 233.608083] [c000000ff4a0bd80] [c000000000047568] kopald+0x50/0x70 (unreliable)
<1>[ 233.608086] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 233.608088] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<3>[ 244.283717] print_req_error: I/O error, dev sdp, sector 0
<3>[ 244.283719] Buffer I/O error on dev sdp, logical block 0, async page read
<3>[ 244.800392] print_req_error: I/O error, dev sdp, sector 0
<3>[ 244.800394] Buffer I/O error on dev sdp, logical block 0, async page read
<3>[ 245.317070] print_req_error: I/O error, dev sdp, sector 41942912
<3>[ 245.833807] print_req_error: I/O error, dev sdp, sector 41942912
<3>[ 245.833808] Buffer I/O error on dev sdp, logical block 327679, async page read
<3>[ 264.798018] INFO: rcu_sched detected stalls on CPUs/tasks:
<3>[ 264.798027] (detected by 11, t=2102 jiffies, g=-215, c=-216, q=969)
<3>[ 264.798040] All QSes seen, last rcu_sched kthread activity 218 (4294963776-4294963558), jiffies_till_next_fqs=1, root ->qsmask 0x0
<1>[ 264.798046] Call Trace:
<1>[ 264.798051] [c000000ffa25f700] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 264.798056] [c000000ffa25f770] [c0000000000bfe38] rcu_check_callbacks+0x6bc/0x788
<1>[ 264.798059] [c000000ffa25f8a0] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 264.798063] [c000000ffa25f8d0] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 264.798067] [c000000ffa25f8f0] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 264.798070] [c000000ffa25f930] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 264.798073] [c000000ffa25f9b0] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 264.798077] [c000000ffa25fa70] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 264.798080] [c000000ffa25fab0] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 264.798084] [c000000ffa25fae0] [c000000000009008] decrementer_common+0x128/0x130
<1>[ 264.798089] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 264.798089] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 264.798090] [c000000ffa25fdd0] [0000000000000000] (null) (unreliable)
<1>[ 264.798095] [c000000ffa25fdf0] [c00000000050a53c] cpuidle_enter_state+0x1b0/0x214
<1>[ 264.798098] [c000000ffa25fe40] [c0000000000a2c50] call_cpuidle+0x6c/0x74
<1>[ 264.798101] [c000000ffa25fe60] [c0000000000a2ef8] do_idle+0x1f0/0x204
<1>[ 264.798105] [c000000ffa25fec0] [c0000000000a3098] cpu_startup_entry+0x30/0x34
<1>[ 264.798109] [c000000ffa25fef0] [c00000000002a3e8] start_secondary+0x4cc/0x4e4
<1>[ 264.798112] [c000000ffa25ff90] [c00000000000aa6c] start_secondary_prolog+0x10/0x14
<3>[ 264.798116] rcu_sched kthread starved for 218 jiffies! g18446744073709551401 c18446744073709551400 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
<1>[ 264.798121] Call Trace:
<1>[ 264.798123] [c000000ffa2278e0] [0000002400000000] 0x2400000000 (unreliable)
<1>[ 264.798127] [c000000ffa227ab0] [c000000000013a18] __switch_to+0x188/0x210
<1>[ 264.798130] [c000000ffa227b10] [c0000000005efab4] __schedule+0x464/0x840
<1>[ 264.798133] [c000000ffa227be0] [c0000000005eff14] schedule+0x84/0x98
<1>[ 264.798136] [c000000ffa227c10] [c0000000005f3500] schedule_timeout+0x208/0x278
<1>[ 264.798139] [c000000ffa227cf0] [c0000000000be548] rcu_gp_kthread+0x824/0x830
<1>[ 264.798142] [c000000ffa227dc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 264.798146] [c000000ffa227e30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<3>[ 289.808023] INFO: rcu_sched self-detected stall on CPU
<3>[ 289.808032] 39-...: (1 GPs behind) idle=cb6/140000000000002/0 softirq=3050/3051 fqs=1030
<3>[ 289.808033] (t=2100 jiffies g=-214 c=-215 q=1359)
<1>[ 289.808049] Call Trace:
<1>[ 289.808053] [c000000ff4a0b130] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 289.808056] [c000000ff4a0b1a0] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 289.808058] [c000000ff4a0b1f0] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 289.808060] [c000000ff4a0b320] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 289.808063] [c000000ff4a0b350] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 289.808065] [c000000ff4a0b370] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 289.808067] [c000000ff4a0b3b0] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 289.808069] [c000000ff4a0b430] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 289.808071] [c000000ff4a0b4f0] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 289.808073] [c000000ff4a0b530] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 289.808075] [c000000ff4a0b560] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 289.808079] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 289.808079] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 289.808080] [c000000ff4a0b850] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 289.808083] [c000000ff4a0b870] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 289.808085] [c000000ff4a0b960] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 289.808087] [c000000ff4a0b980] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 289.808089] [c000000ff4a0b9b0] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 289.808093] --- interrupt: 901 at schedule_timeout+0x88/0x278
<1>[ 289.808093] LR = schedule_timeout+0x1c/0x278
<1>[ 289.808094] [c000000ff4a0bca0] [c0000000005f350c] schedule_timeout+0x214/0x278 (unreliable)
<1>[ 289.808097] [c000000ff4a0bd80] [c000000000047568] kopald+0x50/0x70
<1>[ 289.808099] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 289.808101] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
<3>[ 335.728022] INFO: rcu_sched self-detected stall on CPU
<3>[ 335.728027] 24-...: (1 GPs behind) idle=c0e/140000000000002/0 softirq=461/461 fqs=1049
<3>[ 335.728027] (t=2100 jiffies g=-211 c=-212 q=516)
<1>[ 335.728044] Call Trace:
<1>[ 335.728047] [c000000ff4a0b1c0] [c0000000000926cc] sched_show_task+0xe8/0xf4 (unreliable)
<1>[ 335.728050] [c000000ff4a0b230] [c0000000000c0500] rcu_dump_cpu_stacks+0xd0/0xe0
<1>[ 335.728053] [c000000ff4a0b280] [c0000000000bfa80] rcu_check_callbacks+0x304/0x788
<1>[ 335.728055] [c000000ff4a0b3b0] [c0000000000c3054] update_process_times+0x3c/0x74
<1>[ 335.728057] [c000000ff4a0b3e0] [c0000000000d2330] tick_sched_handle.isra.3+0x50/0x64
<1>[ 335.728059] [c000000ff4a0b400] [c0000000000d2390] tick_sched_timer+0x4c/0x8c
<1>[ 335.728061] [c000000ff4a0b440] [c0000000000c3b1c] __hrtimer_run_queues+0xe0/0x16c
<1>[ 335.728063] [c000000ff4a0b4c0] [c0000000000c3dbc] hrtimer_interrupt+0xac/0x1f0
<1>[ 335.728065] [c000000ff4a0b580] [c000000000019228] __timer_interrupt+0x7c/0xdc
<1>[ 335.728067] [c000000ff4a0b5c0] [c000000000019330] timer_interrupt+0xa8/0xc0
<1>[ 335.728069] [c000000ff4a0b5f0] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 335.728073] --- interrupt: 901 at .L144+0x0/0x4
<1>[ 335.728073] LR = arch_local_irq_restore+0x5c/0x80
<1>[ 335.728074] [c000000ff4a0b8e0] [c000000000093588] vtime_account_irq_enter+0x54/0x5c (unreliable)
<1>[ 335.728077] [c000000ff4a0b900] [c00000000006eb80] __do_softirq+0xd8/0x290
<1>[ 335.728079] [c000000ff4a0b9f0] [c00000000006efe0] irq_exit+0x88/0xe0
<1>[ 335.728081] [c000000ff4a0ba10] [c000000000019334] timer_interrupt+0xac/0xc0
<1>[ 335.728083] [c000000ff4a0ba40] [c00000000000b864] fast_exception_return+0x128/0x150
<1>[ 335.728087] --- interrupt: 901 at _restgpr0_31+0x0/0xba38
<1>[ 335.728087] LR = irq_work_queue+0x74/0xa4
<1>[ 335.728089] [c000000ff4a0bd30] [c00000000004b204] opal_handle_events+0x54/0xa8 (unreliable)
<1>[ 335.728091] [c000000ff4a0bd80] [c00000000004755c] kopald+0x44/0x70
<1>[ 335.728093] [c000000ff4a0bdc0] [c0000000000880b4] kthread+0x150/0x158
<1>[ 335.728096] [c000000ff4a0be30] [c00000000000b464] ret_from_kernel_thread+0x5c/0x78
/ #
@npiggin Thanks, Have you done the tests mention in this issue against your patches?
No I haven't had a chance to yet, I've been caught up doing P9 work at the moment.
I brought the patch up to date and resubmitted it, but... I don't think it solves the root cause of the problem, there must be a long running OPAL event that triggers it AFAIKS. Maybe EEH?