bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

[fix][client] Fix race conditions of LedgerHandle in client

Open graysonzeng opened this issue 1 year ago • 32 comments

Master Issue: https://github.com/apache/pulsar/issues/21860

Motivation

When bookies do a rolling restart, pulsar topic fence state may be triggered due to race conditions. And it can not recover.

After check the heap dump of the broker, we can see the pendingWriteOps is 161, this is the reason why the topic can not recover from the fenced state.

image

image

The topic will only change to unfenced when pendingWriteOps is reduced to 0. See unfenced condition

    private void decrementPendingWriteOpsAndCheck() {
        long pending = pendingWriteOps.decrementAndGet();
       // unfenced  condition
        if (pending == 0 && isFenced && !isClosingOrDeleting) {
            synchronized (this) {
                if (isFenced && !isClosingOrDeleting) {
                    messageDeduplication.resetHighestSequenceIdPushed();
                    log.info("[{}] Un-fencing topic...", topic);
                    // signal to managed ledger that we are ready to resume by creating a new ledger
                    ledger.readyToCreateNewLedger();
                    unfence();
                }

            }
        }
    }

After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.

Changes

Add a boolean value sendingCallbacks to indicate whether sendAddSuccessCallbacks is being called .If we find that method sendAddSuccessCallbacks is being called when we try to call it, return directly.

Not recommended changes

If we add synchronized to the sendAddSuccessCallbacks, it will impact the performance and may lead to deadlock.

graysonzeng avatar Jan 05 '24 18:01 graysonzeng

After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.

It looks like all calls happen under the same object monitor lock (synchronized for PendingAddOp), therefore it seems that concurrent calls are already prevented. @graysonzeng Could you please check if this is the case?

lhotari avatar Jan 10 '24 08:01 lhotari

After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.

It looks like all calls happen under the same object monitor lock (synchronized for PendingAddOp), therefore it seems that concurrent calls are already prevented. @graysonzeng Could you please check if this is the case?

At first time I was thinking the same doubt as you, it didn't seem like it should be happening. But in fact, these two threads can call two different PendingAddOp instances, and the two instances have the same LedgerHandle instance attribute. Synchronized for PendingAddOp does not block the same LedgerHandle sendAddSuccessCallbacks method of the instance is called, concurrent calls are not prevented. @lhotari

graysonzeng avatar Jan 10 '24 10:01 graysonzeng

At first I was thinking the same thing as you, it didn't seem like it should be happening. But in fact, these two threads can call two different PendingAddOp instances, and these two instances have the same LedgerHandle instance attribute. Synchronized for PendingAddOp does not block the same LedgerHandle sendAddSuccessCallbacks method of the instance is called, the sendAddSuccessCallbacks concurrent calls are not prevented.

Thanks for explaining that @graysonzeng. Makes sense. Just wondering if the logic really works correctly without making the sendAddSuccessCallbacks method synchronized. For example, on line 1831, the call to pendingAddOps.remove() seems to assume that it is the same instance that pendingAddOps.peek() returned on line 1814. The extensive usage of synchronized isn't nice, but there doesn't seem to be away around it?

lhotari avatar Jan 10 '24 11:01 lhotari

@graysonzeng Another thread safety problem is the changingEnsemble field. It's modified under the metadataLock object monitor.

https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1895-L1908

should we make changingEmsemble field volatile to fix this part of the problem?

lhotari avatar Jan 10 '24 11:01 lhotari

Thanks for explaining that @graysonzeng. Makes sense. Just wondering if the logic really works correctly without making the sendAddSuccessCallbacks method synchronized. For example, on line 1831, the call to pendingAddOps.remove() seems to assume that it is the same instance that pendingAddOps.peek() returned on line 1814. The extensive usage of synchronized isn't nice, but there doesn't seem to be away around it?

When I first repaired it, I tried to use synchronized on sendAddSuccessCallbacks, and a deadlock occurred.

"BookKeeperClientWorker-OrderedExecutor-8-0":
	at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1811)
	- waiting to lock <0x00000000bfae1120> (a org.apache.bookkeeper.client.LedgerHandle)
	at org.apache.bookkeeper.client.PendingAddOp.unsetSuccessAndSendWriteRequest(PendingAddOp.java:204)
	- locked <0x00000000c3946f68> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.client.LedgerHandle.unsetSuccessAndSendWriteRequest(LedgerHandle.java:2006)
	at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$10(LedgerHandle.java:1997)
	at org.apache.bookkeeper.client.LedgerHandle$$Lambda$1586/0x00007f8e4c95ac58.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniWhenComplete([email protected]/CompletableFuture.java:863)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire([email protected]/CompletableFuture.java:841)
	at java.util.concurrent.CompletableFuture$Completion.run([email protected]/CompletableFuture.java:482)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:833)


"pulsar-io-11-12":
	at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:394)
	- waiting to lock <0x00000000c3946f68> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1835)
	- locked <0x00000000bfae1120> (a org.apache.bookkeeper.client.LedgerHandle)
	at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
	- locked <0x00000000c3945640> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:284)
	at org.apache.bookkeeper.proto.BookieClientImpl.access$000(BookieClientImpl.java:78)
	at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:396)
	at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:356)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2548)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2453)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:110)
	at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89)
	at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:228)
	at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
	at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:189)
	at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:175)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
	at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:833)

Therefore I want to use sendingCallbacks instead of synchronized

For example, on line 1831, the call to pendingAddOps.remove() seems to assume that it is the same instance that pendingAddOps.peek() returned on line 1814.

Thanks for reminding. I replaced pendingAddOps.remove() with pendingAddOps.remove(pendingAddOp) to ensure that the same element is processed.

should we make changingEmsemble field volatile to fix this part of the problem?

Great suggestion, this will ensure the visibility of changingEmsemble. I've added it

graysonzeng avatar Jan 11 '24 04:01 graysonzeng

Therefore I want to use sendingCallbacks instead of synchronized

Makes sense. The only concern is about correctness. If the method is already processing while it gets called another time, is it necessary to execute the loop one extra time to ensure that something doesn't get left unprocessed?

lhotari avatar Jan 11 '24 06:01 lhotari

Btw. The thread execution model for LedgerHandle isn't consistent. In some cases, the thread is switched to use the ordered executor and sometimes it's not. For example, here the thread is switched: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1876-L1877

lhotari avatar Jan 11 '24 06:01 lhotari

This observation isn't about this PR directly. I have so many doubts of the correctness of sendAddSuccessCallbacks when it's not executed by the ordered executor.

Talking about the current code for LedgerHandle.sendAddSuccessCallbacks in master branch: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1809-L1841

If the order gets mixed, it seems that no progress could be made?

https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829

lhotari avatar Jan 11 '24 07:01 lhotari

The logic in sendAddSuccessCallbacks in master branch feels wrong. The logic gets stuck if the call to sendAddSuccessCallbacks is missed or out of order. Calling the method multiple times won't help once the head of the queue is out of sync since removing entries requires that the head of the queue has pendingAddOp.entryId == pendingAddsSequenceHead + 1. @eolivelli Do you have a chance to help with this challenge?

lhotari avatar Jan 11 '24 07:01 lhotari

Makes sense. The only concern is about correctness. If the method is already processing while it gets called another time, is it necessary to execute the loop one extra time to ensure that something doesn't get left unprocessed?

The sendAddSuccessCallbacks method has no input parameters, which means that assuming both threads want to call it, the execution result will be the same no matter which thread triggers it. Therefore there is no need to execute an additional loop

graysonzeng avatar Jan 11 '24 07:01 graysonzeng

The sendAddSuccessCallbacks method has no input parameters, which means that assuming both threads want to call it, the execution result will be the same no matter which thread triggers it. Therefore there is no need to execute an additional loop

@graysonzeng I added another comment https://github.com/apache/bookkeeper/pull/4171#issuecomment-1886440245 where I explain the problem.

I think that this line should be changed to use the ordered executor: https://github.com/apache/bookkeeper/blob/234b817cdb4e054887ffd5e42eaed25dc02daf63/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L204

Changing Line 204 to

            lh.executeOrdered(lh::sendAddSuccessCallbacks);

And then sendAddSuccessCallbacks could be made a synchronized method. That would prevent the dead lock seen in https://github.com/apache/bookkeeper/pull/4171#issuecomment-1886240623 . Makes sense?

lhotari avatar Jan 11 '24 07:01 lhotari

And then sendAddSuccessCallbacks could be made a synchronized method. That would prevent the dead lock seen in #4171 (comment) . Makes sense?

It's possible that it doesn't solve the problem. Most likely it solves the dead lock when the synchronized method gets called without holding any other locks.

lhotari avatar Jan 11 '24 07:01 lhotari

@graysonzeng I have a question about the problem description:

After a deep investigation, we found the cause of the error。The root cause due to sendAddSuccessCallbacks may be multiple called at the same time. One is that unsetSuccessAndSendWriteRequest is called by the BookKeeperClientWorker-OrderedExecutor thread, and the other is writeCompletein pulsar-io thread. We should prevent sendAddSuccessCallbacks from being called again before it completes.

Why does it exactly cause a problem when the method is called multiple times at the same time? I can see that the visibility of changingEnsemble field is a bug and could cause issues, but what other actual problems are there in concurrent access.

lhotari avatar Jan 11 '24 07:01 lhotari

The logic in sendAddSuccessCallbacks in master branch feels wrong. The logic gets stuck if the call to sendAddSuccessCallbacks is missed or out of order. Calling the method multiple times won't help once the head of the queue is out of sync since removing entries requires that the head of the queue has pendingAddOp.entryId == pendingAddsSequenceHead + 1. @eolivelli Do you have a chance to help with this challenge?

I agree with your comment. In fact, this comment exactly answers the reason for the problem. when the method is called multiple times at the same time,missed or out of order will happen in pendingAddOp. pendingAddOp.entryId == pendingAddsSequenceHead + 1 condition prevents it from continuing to execute. The heap dump of the broker confirms this

image

In a ledgerHanle instance, we can see the peek pendingAddOp pendingAddOp.entryId is 224113,it is not it is not equal to pendingAddsSequenceHead + 1 (pendingAddsSequenceHead 224111) @lhotari

graysonzeng avatar Jan 11 '24 08:01 graysonzeng

@graysonzeng After spending more time with this issue, here are my suggestions to fix the issue:

  • make sendAddSuccessCallbacks method synchronized.
  • make changingEnsemble field volatile
  • change line 204 in PendingAddOp to lh.executeOrdered(lh::sendAddSuccessCallbacks); to prevent the deadlock
  • add logic to drainPendingAddsAndAdjustLength method to update pendingAddsSequenceHead accordingly. The drainPendingAddsAndAdjustLength method is where pendingAddsSequenceHead could currently get out of sync.

lhotari avatar Jan 11 '24 10:01 lhotari

@lhotari

make changingEnsemble field volatile

The changingEnsemble is already protected by synchronized, why do we need volatile?

change line 204 in PendingAddOp to lh.executeOrdered(lh::sendAddSuccessCallbacks); to prevent the deadlock

+1 for @eolivelli 's suggestion. You can add a “safeSendAddSuccessCallbacks” that calls sendAddSuccessCallbacks in the OrderedExecutor

make sendAddSuccessCallbacks method synchronized.

After we make sendAddSuccessCallbacks is called by the same thread, we don't need synchronize.

add logic to drainPendingAddsAndAdjustLength method to update pendingAddsSequenceHead accordingly. The drainPendingAddsAndAdjustLength method is where pendingAddsSequenceHead could currently get out of sync.

IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.

@graysonzeng We'd better add a unit test to verify and protect the logic.

hangc0276 avatar Jan 11 '24 15:01 hangc0276

The changingEnsemble is already protected by synchronized, why do we need volatile?

@hangc0276 changingEnsemble is mutated under synchronized (metadataLock) {. A mutation in a synchronized block isn't sufficient alone. it would have to be read under synchronized (metadataLock) { for it to be thread safe without considering any other means for achieving "happens-before". Making it volatile is one way to fix the current issue.

lhotari avatar Jan 11 '24 17:01 lhotari

After we make sendAddSuccessCallbacks is called by the same thread, we don't need synchronize.

@hangc0276 True, but that would be a significant change that could at least cause performance regressions. The Bookkeeper client code doesn't seem to use the OrderedExecutor in most cases. The current code base is using synchronized in this area.

After thinking about it, it might not be necessary to make the method itself synchronized. It might not even be necessary to run it with an OrderedExecutor. I think that it's sufficient to ensure that sendAddSuccessCallbacks and drainPendingAddsAndAdjustLength methods don't race and are serially executed. A simple synchronized(pendingAddOps) { could be the simplest possible solution. It's also possible that a race between sendAddSuccessCallbacks and drainPendingAddsAndAdjustLength methods isn't even a problem.

IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.

The issue in drainPendingAddsAndAdjustLength isn't an ordering problem. The method simply doesn't update pendingAddsSequenceHead so that sendAddSuccessCallbacks could work after drainPendingAddsAndAdjustLength has been called. The instance will get stuck when pendingAddsSequenceHead gets out of sync in drainPendingAddsAndAdjustLength. My current assumption is that it's the root cause of this issue together with the changingEnsemble thread visibility issue.

lhotari avatar Jan 11 '24 17:01 lhotari

I created an alternative fix #4175 to bring clarity of what I'd be proposing. Once we have a reproducer for the issue in BK as a unit test, it will be easier to validate.

lhotari avatar Jan 11 '24 21:01 lhotari

@lhotari @hangc0276 @eolivelli Thank you very much for your suggestions, I'll be happy to continue improving it.

True, but that would be a significant change that could at least cause performance regressions

  1. So we not use the OrderedExecutor, and should synchronized (pendingAddOps) for serializing drainPendingAddsAndAdjustLength & sendAddSuccessCallbacks. Is it right? @lhotari

  2. make changingEnsemble field volatile.

add logic to drainPendingAddsAndAdjustLength method to update pendingAddsSequenceHead accordingly

  1. So make lastEntry = lastAddPushed = pendingAddsSequenceHead = LedgerHandle.this.lastAddConfirmed in doAsyncCloseInternal

IMO, the drainPendingAddsAndAdjustLength method also need to be executed by OrderedExecutor.

After that, we don't need to use OrderedExecutor. Is it right? @hangc0276

remove the too strict rule for pendingAddsSequenceHead which breaks things after failures

I'm a little confused about this . Can you tell it more about it? @lhotari

graysonzeng avatar Jan 12 '24 02:01 graysonzeng

  1. So we not use the OrderedExecutor, and should synchronized (pendingAddOps) for serializing drainPendingAddsAndAdjustLength & sendAddSuccessCallbacks. Is it right? @lhotari

@graysonzeng I don't think that using OrderedExecutor is justified since callback responses have never been executed by the OrderedExecutor. All sendAddSuccessCallbacks & drainPendingAddsAndAdjustLength calls should be serialized for correctness and that's what synchronized (pendingAddOps) { solves without causing a risk for a dead lock.

remove the too strict rule for pendingAddsSequenceHead which breaks things after failures

I'm a little confused about this . Can you tell it more about it? @lhotari

@graysonzeng This rule doesn't make sense to me: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829 My assumption is that this doesn't hold under failure conditions. Why would there even need to be such a rule? When this rule kicks in, the LedgerHandle will stop calling callbacks. That doesn't make any sense to me.

Here's a sign that the entry id isn't always a continuous sequence: In LedgerHandle.updateLastConfirmed: https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1372

When the LAC jumps forward and lastAddPushed = Math.max(lastAddPushed, lac) kicks in, the pendingAddsSequenceHead rule would again prevent LedgerHandle from calling callbacks in sendAddSuccessCallbacks.

I wonder if I'm making correct observations about the code. @eolivelli could you please explain how the above condition is handled?

lhotari avatar Jan 12 '24 07:01 lhotari

Thanks for your reply @lhotari, I have some test for #4175 , Unfortunately it will also cause deadlock.


"pulsar-io-3-4":
	at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:394)
	- waiting to lock <0x000000077f2becf0> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1838)
	- locked <0x000000077e6c0d58> (a java.util.concurrent.ConcurrentLinkedQueue)
	at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
	- locked <0x000000077f2bae98> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.proto.BookieClientImpl.completeAdd(BookieClientImpl.java:284)
	at org.apache.bookkeeper.proto.BookieClientImpl.access$000(BookieClientImpl.java:78)
	at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:396)
	at org.apache.bookkeeper.proto.BookieClientImpl$ChannelReadyForAddEntryCallback.operationComplete(BookieClientImpl.java:356)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2548)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ConnectionFutureListener.operationComplete(PerChannelBookieClient.java:2453)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:110)
	at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89)
	at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:228)
	at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:47)
	at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:189)
	at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:175)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
	at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:833)

"BookKeeperClientWorker-OrderedExecutor-7-0":
	at org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1819)
	- waiting to lock <0x000000077e6c0d58> (a java.util.concurrent.ConcurrentLinkedQueue)
	at org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:390)
	at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:307)
	- locked <0x000000077f2becf0> (a org.apache.bookkeeper.client.PendingAddOp)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2183)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.handleResponse(PerChannelBookieClient.java:2240)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.handleV2Response(PerChannelBookieClient.java:2219)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadV2ResponseCallback.run(PerChannelBookieClient.java:1397)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run([email protected]/Thread.java:833)

Found 1 deadlock.

From the stack we can see that this happens within sendAddSuccessCallbacks

    void sendAddSuccessCallbacks() {

        // thread A has acquired the lock of pendingAddOp instance A , and is trying to acquire it <----
        synchronized (pendingAddOps) {
            PendingAddOp pendingAddOp;
            while ((pendingAddOp = pendingAddOps.peek()) != null
                    && !changingEnsemble) {
                if (!pendingAddOp.completed) {
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("pending add not completed: {}", pendingAddOp);
                    }
                    return;
                }

                pendingAddOps.remove();
                explicitLacFlushPolicy.updatePiggyBackedLac(lastAddConfirmed);
                pendingAddsSequenceHead = pendingAddOp.entryId;
                if (!writeFlags.contains(WriteFlag.DEFERRED_SYNC)) {
                    this.lastAddConfirmed = pendingAddsSequenceHead;
                }
                
                // thread B has acquired the lock of pendingAddOp instance B and   
                // the `pendingAddOps` lock of this LedgerHandle instance,  and is trying to acquire it <----
                pendingAddOp.submitCallback(BKException.Code.OK);
            }
        }

    }

thread A -> pendingAddOp lock A (locked) -> LedgerHandle lock L (locked) -> pendingAddOp lock B (waiting) thread B -> pendingAddOp lock B (locked) -> LedgerHandle lock L (waiting)

Here is the entire stack file deadLockThreadDump.txt

@hangc0276 True, but that would be a significant change that could at least cause performance regressions.

Suppose we are not willing to use OrderedExecutor because of performance regressions, are there any other better suggestions? i'm looking forward to it @lhotari

graysonzeng avatar Jan 15 '24 07:01 graysonzeng

Thanks for your reply @lhotari, I have some test for https://github.com/apache/bookkeeper/pull/4175 , Unfortunately it will also cause deadlock.

Suppose we are not willing to use OrderedExecutor because of performance regressions, are there any other better suggestions? i'm looking forward to it @lhotari

Thanks for testing this @graysonzeng .

It seems that one possibility to prevent the deadlock would be to use the solution that you had proposed initially, the AtomicBoolean solution. The synchronized (pendingAddOps) { block could continued to be used to prevent a race caused by drainPendingAddsAndAdjustLength. I pushed a commit to #4175 .

lhotari avatar Jan 15 '24 11:01 lhotari

Thanks for your help @lhotari ,I have updated the PR and test it again.
Can you help me review it again? @hangc0276 @eolivelli I also think OrderedExecutor will cause performance regressions. How about we fix it by going through the synchronized first? If agree to use it first, I'd like to try adding a unit test to verify it.

graysonzeng avatar Jan 17 '24 10:01 graysonzeng

Thanks for your help @lhotari ,I have updated the PR and test it again. Can you help me review it again? @hangc0276 @eolivelli I also think OrderedExecutor will cause performance regressions. How about we fix it by going through the synchronized first? If agree to use it first, I'd like to try adding a unit test to verify it.

@graysonzeng Does the current solution in this PR pass your tests?

lhotari avatar Jan 25 '24 09:01 lhotari

Client's BookieWriteLedgerTest and DeferredSyncTest are failing. It is always possible that those tests are invalid. The logic adding for deferred sync doesn't make sense to me.

https://github.com/apache/bookkeeper/blob/13e7efaa971cd3613b065ac50836c5ee98985d13/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1822-L1829

More comments about this in https://github.com/apache/bookkeeper/pull/4171#issuecomment-1888561697 . I hope @eolivelli would have the chance to chime in again.

lhotari avatar Jan 25 '24 09:01 lhotari

@graysonzeng Does the current solution in this PR pass your tests?

it is pass. @lhotari

Client's BookieWriteLedgerTest and DeferredSyncTest are failing.

I'll fix it if needed

graysonzeng avatar Jan 30 '24 11:01 graysonzeng

@graysonzeng is this related to #4194 / #4097 ?

lhotari avatar Feb 08 '24 08:02 lhotari

@graysonzeng Thanks for your contribution, CI is not passed yet, any updates?

hezhangjian avatar May 24 '24 13:05 hezhangjian

@graysonzeng Thanks for your contribution, CI is not passed yet, any updates?

@shoothzj Sorry for late reply, I think the current fix may still be controversial. Maybe it would be better to improve the PR after #4194 is approved? What do you think?

graysonzeng avatar Jun 27 '24 07:06 graysonzeng