OpenSearch
OpenSearch copied to clipboard
[Remote Store] Cluster State Applier thread blocked on remote store operations
Describe the bug
On remote store clusters, we can see cluster state applier thread is blocked on remote store calls. In case when the calls to remote store takes a lot of time, the node is not able to apply the cluster state and LagDetector
on the cluster manager kicks it out .
[2024-01-24T20:56:24,412][WARN ][o.o.i.c.IndicesClusterStateService] [a] [.index][5] marking and sending shard failed due to [failed to create shard]
java.io.IOException: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:138)
at org.opensearch.index.store.RemoteSegmentStoreDirectory.readLatestMetadataFile(RemoteSegmentStoreDirectory.java:191)
at org.opensearch.index.store.RemoteSegmentStoreDirectory.init(RemoteSegmentStoreDirectory.java:145)
at org.opensearch.index.store.RemoteSegmentStoreDirectory.<init>(RemoteSegmentStoreDirectory.java:132)
at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:74)
at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:49)
at org.opensearch.index.IndexService.createShard(IndexService.java:488)
at org.opensearch.indices.IndicesService.createShard(IndicesService.java:1036)
at org.opensearch.indices.IndicesService.createShard(IndicesService.java:212)
at org.opensearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:673)
at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:650)
at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:295)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561)
at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484)
at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
at org.opensearch.repositories.s3.S3BlobContainer.listBlobsByPrefixInSortedOrder(S3BlobContainer.java:455)
at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:234)
at org.opensearch.common.blobstore.EncryptedBlobContainer.listBlobsByPrefixInSortedOrder(EncryptedBlobContainer.java:207)
at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:127)
... 22 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
On cluster manager node :
[2024-01-24T20:56:24,339][WARN ][o.o.c.c.LagDetector ] [0f2] node [{a}{b}{c}{{dir}] is lagging at cluster state version [29651], although publication of cluster state version [29652] completed [1.5m] ago
[2024-01-24T20:56:25,192][INFO ][o.o.c.s.MasterService ] [0f2] node-left [{a}{b}{c}{{dir}] reason: lagging], term: 14, version: 29656, delta: removed {[{a}{b}{c}{{dir}]}
Related component
Storage:Durability
To Reproduce
We see this when there are high amount of relocations in the cluster .
Expected behavior
For cluster state applier, we should have a dedicated threadpool, so that it doesn't get blocked on any resource - be it threadpool / connections etc.
Additional Details
Plugins repository-s3
Host/Environment (please complete the following information):
- Amazon Linux 2
Additional context Add any other context about the problem here.
Thanks Gaurav, Ideally cluster state applier thread blocks or should block any blocking or networking operation on that thread. But given the flow with remote store we might need a dedicated and prioritized threadpool in remote store for cluster state applier interactions and we may want to still want to keep the blocking behaviour of the cluster state applier thread.
We shouldn't perform expensive operation in Cluster state applier thread. If you can offload this work to dedicated thread pool, that would be preferred.
Ideally applier are expected to finish before any listeners can be triggered but we can evaluate if appliers can execute their task which doesn't depend on the cluster state in the parallel in the background.