OpenSearch
OpenSearch copied to clipboard
[BUG] Closing an index with ongoing primary failover can lead to deadlocks
Describe the bug
While replica is being promoted to primary to primary failover we create a new engine pointing to last successful commit and replay the translog till global checkpoint. During this processing we acquire multiple locks.
Around the same time if CLOSE is triggered on the index, the close flow and the failover flow at some pointing are competing for the same locks from each other, which in certain race conditions can lead to a deadlock.
Failover Thread
The failover thread gets BLOCKED at acquiring the engineMutex
lock at the following point
https://github.com/opensearch-project/OpenSearch/blob/5c82ab885a876d659c9714c3b080488777506027/server/src/main/java/org/opensearch/index/shard/IndexShard.java#L4752-L4763
It is already holding a readLock on the new engine being created as part of the failover. This lock is acquired here
https://github.com/opensearch-project/OpenSearch/blob/5c82ab885a876d659c9714c3b080488777506027/server/src/main/java/org/opensearch/index/translog/InternalTranslogManager.java#L114-L145
IndexShard Close Thread
The close thread acquires the engineMutex lock at the following point
https://github.com/opensearch-project/OpenSearch/blob/5c82ab885a876d659c9714c3b080488777506027/server/src/main/java/org/opensearch/index/shard/IndexShard.java#L1985-L2013
and tries to acquire writeLock on the new engine further in the close flow.
https://github.com/opensearch-project/OpenSearch/blob/5c82ab885a876d659c9714c3b080488777506027/server/src/main/java/org/opensearch/index/engine/Engine.java#L2013-L2025
At this point, both threads are in a deadlock state.
Thread Dumps
"opensearch[node_s2][generic][T#3]" ID=4135 BLOCKED on java.lang.Object@3c0fa470 owned by "opensearch[node_s2][indices_shutdown][T#1]" ID=4183
at app//org.opensearch.index.shard.IndexShard$11.getSegmentInfosSnapshot(IndexShard.java:4768)
- blocked on java.lang.Object@3c0fa470
at app//org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:5113)
at app//org.opensearch.index.shard.IndexShard.getLatestSegmentInfosAndCheckpoint(IndexShard.java:1676)
at app//org.opensearch.index.shard.IndexShard.updateReplicationCheckpoint(IndexShard.java:4690)
at app//org.opensearch.index.shard.IndexShard$ReplicationCheckpointUpdater.afterRefresh(IndexShard.java:4684)
at app//org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275)
at app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182)
at app//org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
at app//org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1862)
at app//org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1977)
at app//org.opensearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:589)
at app//org.opensearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:546)
at app//org.opensearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:146)
at app//org.opensearch.index.shard.IndexShard.resetEngineToGlobalCheckpoint(IndexShard.java:4812)
at app//org.opensearch.index.shard.IndexShard.lambda$updateShardState$4(IndexShard.java:720)
at app//org.opensearch.index.shard.IndexShard$$Lambda/0x00007f45aca36238.run(Unknown Source)
at app//org.opensearch.index.shard.IndexShard$5.onResponse(IndexShard.java:4079)
at app//org.opensearch.index.shard.IndexShard$5.onResponse(IndexShard.java:4049)
at app//org.opensearch.index.shard.IndexShard.lambda$asyncBlockOperations$33(IndexShard.java:4000)
at app//org.opensearch.index.shard.IndexShard$$Lambda/0x00007f45aca11738.accept(Unknown Source)
at app//org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
at app//org.opensearch.index.shard.IndexShardOperationPermits$1.doRun(IndexShardOperationPermits.java:157)
at app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:911)
at app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
at [email protected]/java.lang.Thread.run(Thread.java:1583)
Locked synchronizers:
- java.util.concurrent.locks.ReentrantLock$NonfairSync@28675992
- java.util.concurrent.ThreadPoolExecutor$Worker@1197c3a3
- java.util.concurrent.locks.ReentrantLock$NonfairSync@57ac517a
20) Thread[id=4183, name=opensearch[node_s2][indices_shutdown][T#1], state=WAITING, group=TGRP-RemoteStoreRestoreIT]
at java.base/jdk.internal.misc.Unsafe.park(Native Method)
at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
at java.base/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
at org.opensearch.common.util.concurrent.ReleasableLock.acquire(ReleasableLock.java:69)
at org.opensearch.index.engine.Engine.close(Engine.java:2070)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
at org.opensearch.index.shard.IndexShard$11.close(IndexShard.java:4785)
at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:2059)
at org.opensearch.index.shard.IndexShard.close(IndexShard.java:2003)
at org.opensearch.index.IndexService.closeShard(IndexService.java:642)
at org.opensearch.index.IndexService.removeShard(IndexService.java:618)
at org.opensearch.index.IndexService.close(IndexService.java:389)
at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1047)
at org.opensearch.indices.IndicesService.lambda$doStop$3(IndicesService.java:542)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Found this issue in the integ test during build - https://build.ci.opensearch.org/job/gradle-check/31461/testReport/junit/org.opensearch.remotestore/RemoteStoreRestoreIT/testRTSRestoreWithNoDataPostCommitPrimaryReplicaDown/
Related component
Other
To Reproduce
Still working on getting repro steps as this seems to be race condition causing the issue. Might be able to repro by slowing down translog recovery and closing the index while recovery is in progress.
Expected behavior
Closing the index while ongoing primary failover should never lead to deadlock
Additional Details
No response
Found another instance of deadlock while running RemoteStoreRestoreIT.testRTSRestoreWithRefreshedDataPrimaryReplicaDown
"opensearch[node_s2][generic][T#1]" ID=377 BLOCKED on java.lang.Object@7caaa3de owned by "opensearch[node_s2][indices_shutdown][T#1]" ID=431
at app//org.opensearch.index.shard.IndexShard$11.getSegmentInfosSnapshot(IndexShard.java:4755)
- blocked on java.lang.Object@7caaa3de
at app//org.opensearch.index.shard.IndexShard.getSegmentInfosSnapshot(IndexShard.java:5102)
at app//org.opensearch.index.shard.IndexShard.getLatestSegmentInfosAndCheckpoint(IndexShard.java:1669)
at app//org.opensearch.index.shard.IndexShard.updateReplicationCheckpoint(IndexShard.java:4677)
at app//org.opensearch.index.shard.IndexShard$ReplicationCheckpointUpdater.afterRefresh(IndexShard.java:4671)
at app//org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275)
at app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182)
at app//org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
at app//org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1771)
at app//org.opensearch.index.engine.InternalEngine.flush(InternalEngine.java:1886)
at app//org.opensearch.index.engine.InternalEngine$1.onAfterTranslogRecovery(InternalEngine.java:270)
at app//org.opensearch.index.translog.listener.CompositeTranslogEventListener.onAfterTranslogRecovery(CompositeTranslogEventListener.java:63)
at app//org.opensearch.index.translog.InternalTranslogManager.recoverFromTranslogInternal(InternalTranslogManager.java:168)
at app//org.opensearch.index.translog.InternalTranslogManager.recoverFromTranslog(InternalTranslogManager.java:132)
at app//org.opensearch.index.shard.IndexShard.resetEngineToGlobalCheckpoint(IndexShard.java:4801)
at app//org.opensearch.index.shard.IndexShard.lambda$updateShardState$4(IndexShard.java:718)
at app//org.opensearch.index.shard.IndexShard$$Lambda/0x00007f86449e6288.run(Unknown Source)
at app//org.opensearch.index.shard.IndexShard$5.onResponse(IndexShard.java:4066)
at app//org.opensearch.index.shard.IndexShard$5.onResponse(IndexShard.java:4036)
at app//org.opensearch.index.shard.IndexShard.lambda$asyncBlockOperations$32(IndexShard.java:3987)
at app//org.opensearch.index.shard.IndexShard$$Lambda/0x00007f86449e7178.accept(Unknown Source)
at app//org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
at app//org.opensearch.index.shard.IndexShardOperationPermits$1.doRun(IndexShardOperationPermits.java:157)
at app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913)
at app//org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
at [email protected]/java.lang.Thread.run(Thread.java:1583)
Locked synchronizers:
- java.util.concurrent.ThreadPoolExecutor$Worker@6cf48ee0
- java.util.concurrent.locks.ReentrantLock$NonfairSync@538b6f5
- java.util.concurrent.locks.ReentrantLock$NonfairSync@22bfe0aa
"opensearch[node_s2][indices_shutdown][T#1]" ID=431 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@68898ecb
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@68898ecb
at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
at [email protected]/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959)
at app//org.opensearch.common.util.concurrent.ReleasableLock.acquire(ReleasableLock.java:69)
at app//org.opensearch.index.engine.Engine.close(Engine.java:2018)
at app//org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
at app//org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
at app//org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
at app//org.opensearch.index.shard.IndexShard$11.close(IndexShard.java:4772)
at app//org.opensearch.index.engine.Engine.flushAndClose(Engine.java:2007)
at app//org.opensearch.index.shard.IndexShard.close(IndexShard.java:1996)
- locked java.lang.Object@7caaa3de
at app//org.opensearch.index.IndexService.closeShard(IndexService.java:642)
at app//org.opensearch.index.IndexService.removeShard(IndexService.java:618)
- locked org.opensearch.index.IndexService@36641ece
at app//org.opensearch.index.IndexService.close(IndexService.java:389)
- locked org.opensearch.index.IndexService@36641ece
at app//org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1063)
at app//org.opensearch.indices.IndicesService.lambda$doStop$4(IndicesService.java:559)
at app//org.opensearch.indices.IndicesService$$Lambda/0x00007f86449def28.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at [email protected]/java.lang.Thread.runWith(Thread.java:1596)
at [email protected]/java.lang.Thread.run(Thread.java:1583)
Locked synchronizers:
- java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@10659274
- java.util.concurrent.ThreadPoolExecutor$Worker@2a84dc1b
Test failed with following error
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([61B84451E1E0BDF2]:0)
[Triage - attendees 1 2] @linuxpi thanks for filing and providing detail on the deadlocking scenario
@linuxpi I have seen this as well but not able to get a repeatable test. In any case I don't think we need the engineMutex while getSegmentInfosSnapshot
is invoked on this RO engine. If the engine ref has not yet been updated we can simply throw the AlreadyClosed exception and callers can handle it. In this case that is when recomputing the checkpoint on flush and close which already handles the AlreadyClosedException. We want to still delegate to the newEngineRef if its set so that the refresh newEngineReference.get().refresh("reset_engine");
refers to the reader on this engine instead of fetching from the RO engine.
Will put up a pr to fix.
Thanks @mch2 . I was able to repro this for multiple flaky tests i was trying to fix. This usually shows up when you run with more than 500 or even 1000 iterations.
For the fix, certainly removing the lock would help and we should do it if the lock is unnecessary here.
But i was thinking if we should also revisit the locking order in IndexShard for various flows to ensure the locking order is correct.