drbd icon indicating copy to clipboard operation
drbd copied to clipboard

Multiple DRBD processes hang, causing the load to increase and eventually the server cannot execute commands.

Open xiahao007 opened this issue 1 year ago • 2 comments

drbd

kenerl log: [四 9月 26 20:15:29 2024] R10: 00007ffe340b788c R11: 0000000000000246 R12: 0000000000000020 [四 9月 26 20:15:29 2024] R13: 0000000000000004 R14: 000055ae4f0042a0 R15: 000055ae4e0827a8 [四 9月 26 20:17:32 2024] INFO: task drbdsetup:3440344 blocked for more than 491 seconds. [四 9月 26 20:17:32 2024] Tainted: G OE --------- --- 5.14.0-162.6.1.el9_1.x86_64 #1 [四 9月 26 20:17:32 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [四 9月 26 20:17:32 2024] task:drbdsetup state:D stack: 0 pid:3440344 ppid: 1 flags:0x00004006 [四 9月 26 20:17:32 2024] Call Trace: [四 9月 26 20:17:32 2024] __schedule+0x206/0x580 [四 9月 26 20:17:32 2024] schedule+0x43/0xa0 [四 9月 26 20:17:32 2024] schedule_timeout+0x11d/0x160 [四 9月 26 20:17:32 2024] ? _raw_spin_unlock_irqrestore+0xa/0x30 [四 9月 26 20:17:32 2024] ? __wake_up_common_lock+0x8a/0xc0 [四 9月 26 20:17:32 2024] __wait_for_common+0x93/0x1d0 [四 9月 26 20:17:32 2024] ? usleep_range_state+0x90/0x90 [四 9月 26 20:17:32 2024] __state_change_unlock+0x4e/0x90 [drbd] [四 9月 26 20:17:32 2024] ? may_be_up_to_date+0xe0/0xe0 [drbd] [四 9月 26 20:17:32 2024] __end_state_change+0x62/0xb0 [drbd] [四 9月 26 20:17:32 2024] change_cluster_wide_state+0xb9/0x520 [drbd] [四 9月 26 20:17:32 2024] ? kvm_sched_clock_read+0x14/0x40 [四 9月 26 20:17:32 2024] ? raw_spin_rq_lock_nested+0x19/0x80 [四 9月 26 20:17:32 2024] ? idr_get_next_ul+0xb6/0xf0 [四 9月 26 20:17:32 2024] change_role+0x1da/0x210 [drbd] [四 9月 26 20:17:32 2024] drbd_set_role+0xc4/0x7b0 [drbd] [四 9月 26 20:17:32 2024] ? drbd_find_resource+0x74/0xb0 [drbd] [四 9月 26 20:17:32 2024] drbd_adm_down+0x81/0x330 [drbd] [四 9月 26 20:17:32 2024] ? __nla_validate_parse+0x141/0x190 [四 9月 26 20:17:32 2024] genl_family_rcv_msg_doit+0xea/0x150 [四 9月 26 20:17:32 2024] genl_rcv_msg+0xdc/0x1e0 [四 9月 26 20:17:32 2024] ? drbd_adm_set_role+0x200/0x200 [drbd] [四 9月 26 20:17:32 2024] ? genl_get_cmd+0xe0/0xe0 [四 9月 26 20:17:32 2024] netlink_rcv_skb+0x51/0x100 [四 9月 26 20:17:32 2024] genl_rcv+0x24/0x40 [四 9月 26 20:17:32 2024] netlink_unicast+0x23b/0x350 [四 9月 26 20:17:32 2024] netlink_sendmsg+0x23b/0x480 [四 9月 26 20:17:32 2024] sock_sendmsg+0x62/0x70 [四 9月 26 20:17:32 2024] sock_write_iter+0x97/0x100 [四 9月 26 20:17:32 2024] new_sync_write+0x19d/0x1b0 [四 9月 26 20:17:32 2024] vfs_write+0x1ef/0x280 [四 9月 26 20:17:32 2024] ksys_write+0xab/0xe0 [四 9月 26 20:17:32 2024] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [四 9月 26 20:17:32 2024] do_syscall_64+0x5c/0x90 [四 9月 26 20:17:32 2024] ? exc_page_fault+0x62/0x150 [四 9月 26 20:17:32 2024] entry_SYSCALL_64_after_hwframe+0x63/0xcd

xiahao007 avatar Sep 28 '24 08:09 xiahao007

drbd version:9.1.14 drbd_code accoding to the log, the method waitng for anything to notify?

xiahao007 avatar Sep 28 '24 08:09 xiahao007

try upgrade drbd first? 9.1.22 is the latest 9.1.x version

rp- avatar Sep 29 '24 18:09 rp-