[fix][client] Fix race conditions of LedgerHandle in client
Master Issue: https://github.com/apache/pulsar/issues/21860
Motivation
When bookies do a rolling restart, pulsar topic fence state may be triggered due to race conditions. And it can not recover.
After check the heap dump of the broker, we can see the pendingWriteOps is 161, this is the reason why the topic can not recover from the fenced state.
The topic will only change to unfenced when pendingWriteOps is reduced to 0. See unfenced condition
private void decrementPendingWriteOpsAndCheck() {
long pending = pendingWriteOps.decrementAndGet();
// unfenced condition
if (pending == 0 && isFenced && !isClosingOrDeleting) {
synchronized (this) {
if (isFenced && !isClosingOrDeleting) {
messageDeduplication.resetHighestSequenceIdPushed();
log.info("[{}] Un-fencing topic...", topic);
// signal to managed ledger that we are ready to resume by creating a new ledger
ledger.readyToCreateNewLedger();
unfence();
}
}
}
}
After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.
Changes
Add a boolean value sendingCallbacks to indicate whether sendAddSuccessCallbacks is being called
.If we find that method sendAddSuccessCallbacks is being called when we try to call it, return directly.
Not recommended changes
If we add synchronized to the sendAddSuccessCallbacks, it will impact the performance and may lead to deadlock.
After a deep investigation, we found the cause of the error。The root cause due to
sendAddSuccessCallbacksmay be multiple called at the same time. One is thatunsetSuccessAndSendWriteRequestis called by the BookKeeperClientWorker-OrderedExecutor thread, and the other iswriteCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.
It looks like all calls happen under the same object monitor lock (synchronized for PendingAddOp), therefore it seems that concurrent calls are already prevented. @graysonzeng Could you please check if this is the case?
After a deep investigation, we found the cause of the error。The root cause due to
sendAddSuccessCallbacksmay be multiple called at the same time. One is thatunsetSuccessAndSendWriteRequestis called by the BookKeeperClientWorker-OrderedExecutor thread, and the other iswriteCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.It looks like all calls happen under the same object monitor lock (synchronized for PendingAddOp), therefore it seems that concurrent calls are already prevented. @graysonzeng Could you please check if this is the case?
At first time I was thinking the same doubt as you, it didn't seem like it should be happening. But in fact, these two threads can call two different PendingAddOp instances, and the two instances have the same LedgerHandle instance attribute. Synchronized for PendingAddOp does not block the same LedgerHandle sendAddSuccessCallbacks method of the instance is called, concurrent calls are not prevented. @lhotari
At first I was thinking the same thing as you, it didn't seem like it should be happening. But in fact, these two threads can call two different
PendingAddOpinstances, and these two instances have the sameLedgerHandleinstance attribute. Synchronized forPendingAddOpdoes not block the same LedgerHandlesendAddSuccessCallbacksmethod of the instance is called, thesendAddSuccessCallbacksconcurrent calls are not prevented.
Thanks for explaining that @graysonzeng. Makes sense. Just wondering if the logic really works correctly without making the sendAddSuccessCallbacks method synchronized. For example, on line 1831, the call to pendingAddOps.remove() seems to assume that it is the same instance that pendingAddOps.peek() returned on line 1814. The extensive usage of synchronized isn't nice, but there doesn't seem to be away around it?
@graysonzeng Another thread safety problem is the changingEnsemble field. It's modified under the metadataLock object monitor.
https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1895-L1908
should we make changingEmsemble field volatile to fix this part of the problem?
Thanks for explaining that @graysonzeng. Makes sense. Just wondering if the logic really works correctly without making the sendAddSuccessCallbacks method synchronized. For example, on line 1831, the call to
pendingAddOps.remove()seems to assume that it is the same instance thatpendingAddOps.peek()returned on line 1814. The extensive usage ofsynchronizedisn't nice, but there doesn't seem to be away around it?
When I first repaired it, I tried to use synchronized on sendAddSuccessCallbacks, and a deadlock occurred.
"BookKeeperClientWorker-OrderedExecutor-8-0":
at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1811)
- waiting to lock <0x00000000bfae1120> (a org.apache.bookkeeper.client.LedgerHandle)
at org.apache.bookkeeper.client.PendingAddOp.unsetSuccessAndSendWriteRequest(PendingAddOp.java:204)
- locked <0x00000000c3946f68> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.client.LedgerHandle.unsetSuccessAndSendWriteRequest(LedgerHandle.java:2006)
at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$10(LedgerHandle.java:1997)
at org.apache.bookkeeper.client.LedgerHandle$$Lambda$1586/0x00007f8e4c95ac58.accept(Unknown Source)
at java.util.concurrent.CompletableFuture.uniWhenComplete([email protected]/CompletableFuture.java:863)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire([email protected]/CompletableFuture.java:841)
at java.util.concurrent.CompletableFuture$Completion.run([email protected]/CompletableFuture.java:482)
at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:833)
"pulsar-io-11-12":
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:394)
- waiting to lock <0x00000000c3946f68> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1835)
- locked <0x00000000bfae1120> (a org.apache.bookkeeper.client.LedgerHandle)
at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
- locked <0x00000000c3945640> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:284)
at org.apache.bookkeeper.proto.BookieClientImpl.access$000(BookieClientImpl.java:78)
at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:396)
at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:356)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2548)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2453)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:110)
at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:228)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:189)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:175)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:833)
Therefore I want to use sendingCallbacks instead of synchronized
For example, on line 1831, the call to pendingAddOps.remove() seems to assume that it is the same instance that pendingAddOps.peek() returned on line 1814.
Thanks for reminding. I replaced pendingAddOps.remove() with pendingAddOps.remove(pendingAddOp) to ensure that the same element is processed.
should we make changingEmsemble field volatile to fix this part of the problem?
Great suggestion, this will ensure the visibility of changingEmsemble. I've added it
Therefore I want to use sendingCallbacks instead of synchronized
Makes sense. The only concern is about correctness. If the method is already processing while it gets called another time, is it necessary to execute the loop one extra time to ensure that something doesn't get left unprocessed?
Btw. The thread execution model for LedgerHandle isn't consistent. In some cases, the thread is switched to use the ordered executor and sometimes it's not. For example, here the thread is switched: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1876-L1877
This observation isn't about this PR directly. I have so many doubts of the correctness of sendAddSuccessCallbacks when it's not executed by the ordered executor.
Talking about the current code for LedgerHandle.sendAddSuccessCallbacks in master branch: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1809-L1841
If the order gets mixed, it seems that no progress could be made?
https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829
The logic in sendAddSuccessCallbacks in master branch feels wrong. The logic gets stuck if the call to sendAddSuccessCallbacks is missed or out of order. Calling the method multiple times won't help once the head of the queue is out of sync since removing entries requires that the head of the queue has pendingAddOp.entryId == pendingAddsSequenceHead + 1.
@eolivelli Do you have a chance to help with this challenge?
Makes sense. The only concern is about correctness. If the method is already processing while it gets called another time, is it necessary to execute the loop one extra time to ensure that something doesn't get left unprocessed?
The sendAddSuccessCallbacks method has no input parameters, which means that assuming both threads want to call it, the execution result will be the same no matter which thread triggers it. Therefore there is no need to execute an additional loop
The sendAddSuccessCallbacks method has no input parameters, which means that assuming both threads want to call it, the execution result will be the same no matter which thread triggers it. Therefore there is no need to execute an additional loop
@graysonzeng I added another comment https://github.com/apache/bookkeeper/pull/4171#issuecomment-1886440245 where I explain the problem.
I think that this line should be changed to use the ordered executor: https://github.com/apache/bookkeeper/blob/234b817cdb4e054887ffd5e42eaed25dc02daf63/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L204
Changing Line 204 to
lh.executeOrdered(lh::sendAddSuccessCallbacks);
And then sendAddSuccessCallbacks could be made a synchronized method. That would prevent the dead lock seen in https://github.com/apache/bookkeeper/pull/4171#issuecomment-1886240623 . Makes sense?
And then sendAddSuccessCallbacks could be made a synchronized method. That would prevent the dead lock seen in #4171 (comment) . Makes sense?
It's possible that it doesn't solve the problem. Most likely it solves the dead lock when the synchronized method gets called without holding any other locks.
@graysonzeng I have a question about the problem description:
After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.
Why does it exactly cause a problem when the method is called multiple times at the same time?
I can see that the visibility of changingEnsemble field is a bug and could cause issues, but what other actual problems are there in concurrent access.
The logic in
sendAddSuccessCallbacksin master branch feels wrong. The logic gets stuck if the call to sendAddSuccessCallbacks is missed or out of order. Calling the method multiple times won't help once the head of the queue is out of sync since removing entries requires that the head of the queue haspendingAddOp.entryId == pendingAddsSequenceHead + 1. @eolivelli Do you have a chance to help with this challenge?
I agree with your comment. In fact, this comment exactly answers the reason for the problem. when the method is called multiple times at the same time,missed or out of order will happen in pendingAddOp. pendingAddOp.entryId == pendingAddsSequenceHead + 1 condition prevents it from continuing to execute. The heap dump of the broker confirms this
In a ledgerHanle instance, we can see the peek pendingAddOp pendingAddOp.entryId is 224113,it is not it is not equal to pendingAddsSequenceHead + 1 (pendingAddsSequenceHead 224111)
@lhotari
@graysonzeng After spending more time with this issue, here are my suggestions to fix the issue:
- make
sendAddSuccessCallbacksmethodsynchronized. - make
changingEnsemblefieldvolatile - change line 204 in
PendingAddOptolh.executeOrdered(lh::sendAddSuccessCallbacks);to prevent the deadlock - add logic to
drainPendingAddsAndAdjustLengthmethod to updatependingAddsSequenceHeadaccordingly. ThedrainPendingAddsAndAdjustLengthmethod is wherependingAddsSequenceHeadcould currently get out of sync.
@lhotari
make changingEnsemble field volatile
The changingEnsemble is already protected by synchronized, why do we need volatile?
change line 204 in PendingAddOp to lh.executeOrdered(lh::sendAddSuccessCallbacks); to prevent the deadlock
+1 for @eolivelli 's suggestion. You can add a “safeSendAddSuccessCallbacks” that calls sendAddSuccessCallbacks in the OrderedExecutor
make sendAddSuccessCallbacks method synchronized.
After we make sendAddSuccessCallbacks is called by the same thread, we don't need synchronize.
add logic to drainPendingAddsAndAdjustLength method to update pendingAddsSequenceHead accordingly. The drainPendingAddsAndAdjustLength method is where pendingAddsSequenceHead could currently get out of sync.
IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.
@graysonzeng We'd better add a unit test to verify and protect the logic.
The
changingEnsembleis already protected by synchronized, why do we need volatile?
@hangc0276 changingEnsemble is mutated under synchronized (metadataLock) {. A mutation in a synchronized block isn't sufficient alone. it would have to be read under synchronized (metadataLock) { for it to be thread safe without considering any other means for achieving "happens-before". Making it volatile is one way to fix the current issue.
After we make
sendAddSuccessCallbacksis called by the same thread, we don't need synchronize.
@hangc0276 True, but that would be a significant change that could at least cause performance regressions. The Bookkeeper client code doesn't seem to use the OrderedExecutor in most cases. The current code base is using synchronized in this area.
After thinking about it, it might not be necessary to make the method itself synchronized. It might not even be necessary to run it with an OrderedExecutor. I think that it's sufficient to ensure that sendAddSuccessCallbacks and drainPendingAddsAndAdjustLength methods don't race and are serially executed. A simple synchronized(pendingAddOps) { could be the simplest possible solution. It's also possible that a race between sendAddSuccessCallbacks and drainPendingAddsAndAdjustLength methods isn't even a problem.
IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.
The issue in drainPendingAddsAndAdjustLength isn't an ordering problem. The method simply doesn't update pendingAddsSequenceHead so that sendAddSuccessCallbacks could work after drainPendingAddsAndAdjustLength has been called. The instance will get stuck when pendingAddsSequenceHead gets out of sync in drainPendingAddsAndAdjustLength. My current assumption is that it's the root cause of this issue together with the changingEnsemble thread visibility issue.
I created an alternative fix #4175 to bring clarity of what I'd be proposing. Once we have a reproducer for the issue in BK as a unit test, it will be easier to validate.
@lhotari @hangc0276 @eolivelli Thank you very much for your suggestions, I'll be happy to continue improving it.
True, but that would be a significant change that could at least cause performance regressions
-
So we not use the OrderedExecutor, and should synchronized (pendingAddOps) for serializing drainPendingAddsAndAdjustLength & sendAddSuccessCallbacks. Is it right? @lhotari
-
make changingEnsemble field volatile.
add logic to drainPendingAddsAndAdjustLength method to update pendingAddsSequenceHead accordingly
- So make
lastEntry = lastAddPushed = pendingAddsSequenceHead = LedgerHandle.this.lastAddConfirmedindoAsyncCloseInternal
IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.
After that, we don't need to use OrderedExecutor. Is it right? @hangc0276
remove the too strict rule for pendingAddsSequenceHead which breaks things after failures
I'm a little confused about this . Can you tell it more about it? @lhotari
- So we not use the OrderedExecutor, and should synchronized (pendingAddOps) for serializing drainPendingAddsAndAdjustLength & sendAddSuccessCallbacks. Is it right? @lhotari
@graysonzeng I don't think that using OrderedExecutor is justified since callback responses have never been executed by the OrderedExecutor. All sendAddSuccessCallbacks & drainPendingAddsAndAdjustLength calls should be serialized for correctness and that's what synchronized (pendingAddOps) { solves without causing a risk for a dead lock.
remove the too strict rule for pendingAddsSequenceHead which breaks things after failures
I'm a little confused about this . Can you tell it more about it? @lhotari
@graysonzeng This rule doesn't make sense to me: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829 My assumption is that this doesn't hold under failure conditions. Why would there even need to be such a rule? When this rule kicks in, the LedgerHandle will stop calling callbacks. That doesn't make any sense to me.
Here's a sign that the entry id isn't always a continuous sequence: In LedgerHandle.updateLastConfirmed: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1372
When the LAC jumps forward and lastAddPushed = Math.max(lastAddPushed, lac) kicks in, the pendingAddsSequenceHead rule would again prevent LedgerHandle from calling callbacks in sendAddSuccessCallbacks.
I wonder if I'm making correct observations about the code. @eolivelli could you please explain how the above condition is handled?
Thanks for your reply @lhotari, I have some test for #4175 , Unfortunately it will also cause deadlock.
"pulsar-io-3-4":
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:394)
- waiting to lock <0x000000077f2becf0> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1838)
- locked <0x000000077e6c0d58> (a java.util.concurrent.ConcurrentLinkedQueue)
at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
- locked <0x000000077f2bae98> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:284)
at org.apache.bookkeeper.proto.BookieClientImpl.access$000(BookieClientImpl.java:78)
at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:396)
at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:356)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2548)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2453)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:110)
at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:228)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:47)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:189)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:175)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:833)
"BookKeeperClientWorker-OrderedExecutor-7-0":
at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1819)
- waiting to lock <0x000000077e6c0d58> (a java.util.concurrent.ConcurrentLinkedQueue)
at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
- locked <0x000000077f2becf0> (a org.apache.bookkeeper.client.PendingAddOp)
at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2183)
at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.handleResponse(PerChannelBookieClient.java:2240)
at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.handleV2Response(PerChannelBookieClient.java:2219)
at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadV2ResponseCallback.run(PerChannelBookieClient.java:1397)
at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run([email protected]/Thread.java:833)
Found 1 deadlock.
From the stack we can see that this happens within sendAddSuccessCallbacks
void sendAddSuccessCallbacks() {
// thread A has acquired the lock of pendingAddOp instance A , and is trying to acquire it <----
synchronized (pendingAddOps) {
PendingAddOp pendingAddOp;
while ((pendingAddOp = pendingAddOps.peek()) != null
&& !changingEnsemble) {
if (!pendingAddOp.completed) {
if (LOG.isDebugEnabled()) {
LOG.debug("pending add not completed: {}", pendingAddOp);
}
return;
}
pendingAddOps.remove();
explicitLacFlushPolicy.updatePiggyBackedLac(lastAddConfirmed);
pendingAddsSequenceHead = pendingAddOp.entryId;
if (!writeFlags.contains(WriteFlag.DEFERRED_SYNC)) {
this.lastAddConfirmed = pendingAddsSequenceHead;
}
// thread B has acquired the lock of pendingAddOp instance B and
// the `pendingAddOps` lock of this LedgerHandle instance, and is trying to acquire it <----
pendingAddOp.submitCallback(BKException.Code.OK);
}
}
}
thread A -> pendingAddOp lock A (locked) -> LedgerHandle lock L (locked) -> pendingAddOp lock B (waiting) thread B -> pendingAddOp lock B (locked) -> LedgerHandle lock L (waiting)
Here is the entire stack file deadLockThreadDump.txt
@hangc0276 True, but that would be a significant change that could at least cause performance regressions.
Suppose we are not willing to use OrderedExecutor because of performance regressions, are there any other better suggestions? i'm looking forward to it @lhotari
Thanks for your reply @lhotari, I have some test for https://github.com/apache/bookkeeper/pull/4175 , Unfortunately it will also cause deadlock.
Suppose we are not willing to use OrderedExecutor because of performance regressions, are there any other better suggestions? i'm looking forward to it @lhotari
Thanks for testing this @graysonzeng .
It seems that one possibility to prevent the deadlock would be to use the solution that you had proposed initially, the AtomicBoolean solution. The synchronized (pendingAddOps) { block could continued to be used to prevent a race caused by drainPendingAddsAndAdjustLength.
I pushed a commit to #4175 .
Thanks for your help @lhotari ,I have updated the PR and test it again.
Can you help me review it again? @hangc0276 @eolivelli I also think OrderedExecutor will cause performance regressions. How about we fix it by going through the synchronized first? If agree to use it first, I'd like to try adding a unit test to verify it.
Thanks for your help @lhotari ,I have updated the PR and test it again. Can you help me review it again? @hangc0276 @eolivelli I also think OrderedExecutor will cause performance regressions. How about we fix it by going through the
synchronizedfirst? If agree to use it first, I'd like to try adding a unit test to verify it.
@graysonzeng Does the current solution in this PR pass your tests?
Client's BookieWriteLedgerTest and DeferredSyncTest are failing. It is always possible that those tests are invalid. The logic adding for deferred sync doesn't make sense to me.
https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829
More comments about this in https://github.com/apache/bookkeeper/pull/4171#issuecomment-1888561697 . I hope @eolivelli would have the chance to chime in again.
@graysonzeng Does the current solution in this PR pass your tests?
it is pass. @lhotari
Client's BookieWriteLedgerTest and DeferredSyncTest are failing.
I'll fix it if needed
@graysonzeng is this related to #4194 / #4097 ?
@graysonzeng Thanks for your contribution, CI is not passed yet, any updates?
@graysonzeng Thanks for your contribution, CI is not passed yet, any updates?
@shoothzj Sorry for late reply, I think the current fix may still be controversial. Maybe it would be better to improve the PR after #4194 is approved? What do you think?