braft icon indicating copy to clipboard operation
braft copied to clipboard

LogManager::disk_thread run closures in bthread after the log manager shutdown, to avoid deadlock

Open PFZheng opened this issue 3 years ago • 5 comments

In the situation, the closure may hold the last reference of NodeImpl, and done->Run() triggers the the destructor of NodeImpl. In the destructor, NodeImpl joins the disk thread to terminate, but the done->Run() itself is excuted in the disk thread. The disk thread can never be terminated!

PFZheng avatar Nov 04 '20 02:11 PFZheng

能构造ut去复现么?

chenzhangyi avatar Nov 10 '20 04:11 chenzhangyi

能构造ut去复现么?

可以的,我加一下

PFZheng avatar Nov 10 '20 05:11 PFZheng

#0 0x00007fc4614a488b in jump_stack (to=, from=) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/stack_inl.h:132 #1 bthread::TaskGroup::sched_to (pg=pg@entry=0x7fbcdd7ab1c8, next_meta=next_meta@entry=0x7fc35084b008) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:604 #2 0x00007fc4614a6023 in sched_to (next_tid=, pg=0x7fbcdd7ab1c8) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/task_group_inl.h:78 #3 bthread::TaskGroup::sched (pg=pg@entry=0x7fbcdd7ab1c8) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:562 #4 0x00007fc4614b65eb in bthread::butex_wait (arg=0x7fbd22d46f88, expected_value=expected_value@entry=0, abstime=abstime@entry=0x0) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/butex.cpp:659 #5 0x00007fc4614a030f in bthread::ExecutionQueueBase::join (id=) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/execution_queue.cpp:225 #6 0x00007fc462574d45 in execution_queue_joinbraft::LogManager::StableClosure* (id=...) at /usr/include/bthread/execution_queue_inl.h:381 #7 braft::LogManager::stop_disk_thread (this=this@entry=0x7fc138fdfe08) at /mraft/src/braft/log_manager.cpp:142 #8 0x00007fc4625765d9 in braft::LogManager::~LogManager (this=0x7fc138fdfe08, __in_chrg=) at /mraft/src/braft/log_manager.cpp:124 #9 0x00007fc46259f6b6 in braft::NodeImpl::~NodeImpl (this=0x7fbc24c71008, __in_chrg=) at /mraft/src/braft/node.cpp:218 #10 0x00007fc46259fc71 in braft::NodeImpl::~NodeImpl (this=0x7fbc24c71008, __in_chrg=) at /mraft/src/braft/node.cpp:254 #11 0x00007fc4625ab6ec in DeleteInternal (x=0x7fbc24c71008) at /usr/include/butil/memory/ref_counted.h:191 #12 Destruct (x=0x7fbc24c71008) at /usr/include/butil/memory/ref_counted.h:154 #13 Release (this=0x7fbc24c71010) at /usr/include/butil/memory/ref_counted.h:182 #14 ~FollowerStableClosure (this=0x7fc087f331c8, __in_chrg=) at /mraft/src/braft/node.cpp:2683 #15 braft::FollowerStableClosure::~FollowerStableClosure (this=0x7fc087f331c8, __in_chrg=) at /mraft/src/braft/node.cpp:2685 #16 0x00007fc4625ac784 in braft::FollowerStableClosure::Run (this=0x7fc087f331c8) at /mraft/src/braft/node.cpp:2678 #17 0x00007fc46257f396 in braft::AppendBatcher::flush (this=this@entry=0x7fbcdd7ab620) at /mraft/src/braft/log_manager.cpp:654 #18 0x00007fc4625775e8 in braft::LogManager::disk_thread (meta=0x7fc138fdfe08, iter=...) at /mraft/src/braft/log_manager.cpp:782 #19 0x00007fc4614a066d in bthread::ExecutionQueueBase::_execute (this=this@entry=0x7fbd2abad308, head=0x7fc102cef788, high_priority=high_priority@entry=false, niterated=niterated@entry=0x0) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/execution_queue.cpp:272 #20 0x00007fc4614a2150 in bthread::ExecutionQueueBase::_execute_tasks (arg=) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/execution_queue.cpp:151 #21 0x00007fc4614a630a in bthread::TaskGroup::task_runner (skip_remained=) at /opt/hezhonglei/mraftdevelop/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:296 #22 0x00007fc4614b0981 in bthread_make_fcontext () from /lib64/libbrpc.so 大量并发场景下死锁了

zhongleihe avatar Nov 20 '20 07:11 zhongleihe

这个bug有可能在example的例子里面复现么

zouyonghao avatar Mar 08 '21 13:03 zouyonghao

这个pr还不能合入?

GOGOYAO avatar Feb 03 '22 05:02 GOGOYAO