OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[Remote Store] Cluster State Applier thread blocked on remote store operations

Open gbbafna opened this issue 1 year ago • 2 comments

Describe the bug

On remote store clusters, we can see cluster state applier thread is blocked on remote store calls. In case when the calls to remote store takes a lot of time, the node is not able to apply the cluster state and LagDetector on the cluster manager kicks it out .

[2024-01-24T20:56:24,412][WARN ][o.o.i.c.IndicesClusterStateService] [a] [.index][5] marking and sending shard failed due to [failed to create shard]
java.io.IOException: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
    at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:138)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.readLatestMetadataFile(RemoteSegmentStoreDirectory.java:191)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.init(RemoteSegmentStoreDirectory.java:145)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.<init>(RemoteSegmentStoreDirectory.java:132)
    at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:74)
    at org.opensearch.index.store.RemoteSegmentStoreDirectoryFactory.newDirectory(RemoteSegmentStoreDirectoryFactory.java:49)
    at org.opensearch.index.IndexService.createShard(IndexService.java:488)
    at org.opensearch.indices.IndicesService.createShard(IndicesService.java:1036)
    at org.opensearch.indices.IndicesService.createShard(IndicesService.java:212)
    at org.opensearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:673)
    at org.opensearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:650)
    at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:295)
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
    at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
    at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:561)
    at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:484)
    at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:186)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:282)
    at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:245)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Exception when listing blobs by prefix [x/y/z/metadata]
    at org.opensearch.repositories.s3.S3BlobContainer.listBlobsByPrefixInSortedOrder(S3BlobContainer.java:455)
    at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:234)
    at org.opensearch.common.blobstore.EncryptedBlobContainer.listBlobsByPrefixInSortedOrder(EncryptedBlobContainer.java:207)
    at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:127)
    ... 22 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
    at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
    at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)

On cluster manager node :

[2024-01-24T20:56:24,339][WARN ][o.o.c.c.LagDetector      ] [0f2] node [{a}{b}{c}{{dir}] is lagging at cluster state version [29651], although publication of cluster state version [29652] completed [1.5m] ago

[2024-01-24T20:56:25,192][INFO ][o.o.c.s.MasterService    ] [0f2] node-left [{a}{b}{c}{{dir}] reason: lagging], term: 14, version: 29656, delta: removed {[{a}{b}{c}{{dir}]}

Related component

Storage:Durability

To Reproduce

We see this when there are high amount of relocations in the cluster .

Expected behavior

For cluster state applier, we should have a dedicated threadpool, so that it doesn't get blocked on any resource - be it threadpool / connections etc.

Additional Details

Plugins repository-s3

Host/Environment (please complete the following information):

  • Amazon Linux 2

Additional context Add any other context about the problem here.

gbbafna avatar Jan 26 '24 05:01 gbbafna

Thanks Gaurav, Ideally cluster state applier thread blocks or should block any blocking or networking operation on that thread. But given the flow with remote store we might need a dedicated and prioritized threadpool in remote store for cluster state applier interactions and we may want to still want to keep the blocking behaviour of the cluster state applier thread.

Bukhtawar avatar Jan 26 '24 16:01 Bukhtawar

We shouldn't perform expensive operation in Cluster state applier thread. If you can offload this work to dedicated thread pool, that would be preferred.

Ideally applier are expected to finish before any listeners can be triggered but we can evaluate if appliers can execute their task which doesn't depend on the cluster state in the parallel in the background.

shwetathareja avatar Feb 09 '24 11:02 shwetathareja