sofa-jraft
sofa-jraft copied to clipboard
Deadlock on configuration application in NodeImpl when disruptors are full
Describe the bug
There is a deadlock in NodeImpl
when working with full LogManagerImpl#diskQueue
, FSMCallerImpl#taskQueue
and NodeImpl#writeLock
.
-
NodeImpl#executeApplyingTasks()
takesNodeImpl.writeLock
and callsLogManager.appendEntries()
-
LogManager
tries to enqueue a task todiskQueue
which is full, hence it blocks until a task gets consumed fromdiskQueue
-
diskQueue
is consumed byStableClosureEventHandler
-
StableClosureEventHandler
tries to enqueue a task toFSMCallerImpl#taskQueue
, which is also full, so this also blocks until a task gets consumed fromFSMCallerImpl#taskQueue
-
FSMCallerImpl#taskQueue
is consumed byApplyTaskHandler
-
ApplyTaskHandler
callsNodeImpl#onConfigurationChangeDone()
, which tries to takeNodeImpl#writeLock
As a result, there is a deadlock: NodeImpl#writeLock
-> LogManager#diskQueue
-> FSMCallerImpl#taskQueue
-> NodeImpl#writeLock
(disruptors are used as blocking queues in JRaft, so, when full, they act like locks).
This was caught by com.alipay.sofa.jraft.core.NodeTest#testNodeTaskOverload
which uses extremely short disruptors (2 items max each).
Steps to reproduce
Run com.alipay.sofa.jraft.core.NodeTest#testNodeTaskOverload
in a loop several times, for my local machine it is reproducible within 50-100 runs.
Environment
- SOFAJRaft version: v1.3.14 (latest commit 890033a64d8ed5c8838463f278b940355553e413)
- JVM version (e.g.
java -version
): openjdk version "11.0.23" - OS version (e.g.
uname -a
): macOs 14.5 - Maven version: 3.9.6
- IDE version: IntelliJ IDEA 2024.1 (Community Edition)