cross-cluster-replication icon indicating copy to clipboard operation
cross-cluster-replication copied to clipboard

[BUG] Lots of 'Metadata for system indices doesn't exist' errors in logs

Open defesteban opened this issue 1 year ago • 2 comments

What is the bug? System indices are skipped during replication but there are a lot of corresponding (Metadata for .opendistro_security doesn't exist) errors in OpenSearch logs.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Run autofollow for all indices:
curl -XPOST -k -H 'Content-Type: application/json' 'https://localhost:9200/_plugins/_replication/_autofollow' -d '
{
  "leader_alias": "leader-cluster",
  "pattern": "*",
  "name": "replication",
  "use_roles": {
    "leader_cluster_role": "all_access",
    "follower_cluster_role": "all_access"
  }
}'
  1. Check OpenSearch logs:
[2023-05-19T14:40:54,107][ERROR][o.o.r.m.ReplicationMetadataManager] [opensearch-0] Encountered exception - 
org.opensearch.ResourceNotFoundException: Metadata for .opendistro_security doesn't exist
 at org.opensearch.replication.metadata.store.ReplicationMetadataStore.getMetadata(ReplicationMetadataStore.kt:146) ~[opensearch-cross-cluster-replication-2.4.1.0.jar:2.4.1.0]
 at org.opensearch.replication.metadata.store.ReplicationMetadataStore$getMetadata$1.invokeSuspend(ReplicationMetadataStore.kt) ~[opensearch-cross-cluster-replication-2.4.1.0.jar:2.4.1.0]
 at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.6.0.jar:1.6.0-release-798(1.6.0)]
 at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:147) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) [kotlin-stdlib-1.6.0.jar:1.6.0-release-798(1.6.0)]
 at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
[2023-05-19T14:40:54,107][ERROR][o.o.r.a.s.TransportReplicationStatusAction] [opensearch-0] got ResourceNotFoundException while querying for status 
org.opensearch.ResourceNotFoundException: Metadata for .opendistro_security doesn't exist
 at org.opensearch.replication.metadata.store.ReplicationMetadataStore.getMetadata(ReplicationMetadataStore.kt:146) ~[opensearch-cross-cluster-replication-2.4.1.0.jar:2.4.1.0]
 at org.opensearch.replication.metadata.store.ReplicationMetadataStore$getMetadata$1.invokeSuspend(ReplicationMetadataStore.kt) ~[opensearch-cross-cluster-replication-2.4.1.0.jar:2.4.1.0]
 at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.6.0.jar:1.6.0-release-798(1.6.0)]
 at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:147) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) [kotlin-stdlib-1.6.0.jar:1.6.0-release-798(1.6.0)]
 at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]
 at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) [kotlinx-coroutines-core-jvm-1.6.0.jar:?]

What is the expected behavior? There is no ReplicationMetadataStore error for indices that are not replicated.

What is your host/environment?

  • OpenSearch Version - 2.4.1

defesteban avatar May 22 '23 07:05 defesteban

This error is printed when the replication status API is invoked. Are you running some job which queries for the replication status for all the indices? If not, can you describe the simulation setup - have you created leader and follower clusters with or without security ?

soosinha avatar Jun 13 '23 02:06 soosinha

Hi!

We have queries that receive replication status, but not for all indices. We exclude indices that start from . from all queries. We configure leader cluster before running the autofollow:

PUT /_cluster/settings
{
  "persistent": {
    "cluster": {
	  "remote": {
		"leader-cluster": {
		  "seeds": ["url"]
		}
	  }
    }
  }
}

Then start autofollow with the following settings:

POST /_plugins/_replication/_autofollow
{
  "leader_alias": "leader-cluster",
  "pattern": "*",
  "name": "replication",
  "use_roles": {
    "leader_cluster_role": "all_access",
	"follower_cluster_role": "all_access"
  }
}

defesteban avatar Jun 19 '23 13:06 defesteban