elasticsearch
elasticsearch copied to clipboard
[CI] RestartIndexFollowingIT testFollowIndex failing
Build scan: https://gradle-enterprise.elastic.co/s/4ayosvibial2k/tests/:x-pack:plugin:ccr:internalClusterTest/org.elasticsearch.xpack.ccr.RestartIndexFollowingIT/testFollowIndex
Reproduction line:
./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.RestartIndexFollowingIT.testFollowIndex" -Dtests.seed=A67E2A734CCC578 -Dtests.locale=sk-SK -Dtests.timezone=America/Managua -Druntime.java=17
Applicable branches: main
Reproduces locally?: Didn't try
Failure history:
Failure dashboard for org.elasticsearch.xpack.ccr.RestartIndexFollowingIT#testFollowIndex
Failure excerpt:
java.lang.AssertionError: Count is 143 hits but 152 was expected. Total shards: 8 Successful shards: 8 & 0 shard failures:
at __randomizedtesting.SeedInfo.seed([A67E2A734CCC578:E90B2D950D74FCF2]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount(ElasticsearchAssertions.java:309)
at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.lambda$assertHitCount$5(ElasticsearchAssertions.java:295)
at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertResponse(ElasticsearchAssertions.java:375)
at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount(ElasticsearchAssertions.java:295)
at org.elasticsearch.xpack.ccr.RestartIndexFollowingIT.lambda$testFollowIndex$1(RestartIndexFollowingIT.java:117)
at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1278)
at org.elasticsearch.xpack.ccr.RestartIndexFollowingIT.testFollowIndex(RestartIndexFollowingIT.java:104)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:568)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.lang.Thread.run(Thread.java:833)
Pinging @elastic/es-distributed (Team:Distributed)
This is still happening: https://gradle-enterprise.elastic.co/s/qii4jqdjvu32u
I was checking the failures and every time we have a mismatch of docs count it is due to something like:
[2024-03-13T13:33:54,641][WARN ][o.e.x.c.a.ShardFollowNodeTask] [follower3] shard follow task encounter non-retryable error
org.elasticsearch.index.IndexNotFoundException: no such index [index1]
at org.elasticsearch.cluster.metadata.Metadata.getIndexSafe(Metadata.java:1238) ~[elasticsearch-8.12.3-SNAPSHOT.jar:8.12.3-SNAPSHOT]
at org.elasticsearch.xpack.ccr.action.ShardFollowTasksExecutor$1.lambda$innerUpdateSettings$6(ShardFollowTasksExecutor.java:204) ~[main/:?]
[2024-03-13T13:33:54,643][WARN ][o.e.x.c.a.ShardFollowNodeTask] [follower3] shard follow task encounter non-retryable error
org.elasticsearch.index.IndexNotFoundException: no such index [index1]
at org.elasticsearch.cluster.metadata.Metadata.getIndexSafe(Metadata.java:1238) ~[elasticsearch-8.12.3-SNAPSHOT.jar:8.12.3-SNAPSHOT]
at org.elasticsearch.xpack.ccr.action.ShardFollowTasksExecutor$1.lambda$innerUpdateAliases$8(ShardFollowTasksExecutor.java:284) ~[main/:?]
[2024-03-13T13:33:54,645][WARN ][o.e.x.c.a.ShardFollowNodeTask] [follower3] shard follow task encounter non-retryable error
org.elasticsearch.index.IndexNotFoundException: no such index [index1]
at org.elasticsearch.cluster.metadata.Metadata.getIndexSafe(Metadata.java:1238) ~[elasticsearch-8.12.3-SNAPSHOT.jar:8.12.3-SNAPSHOT]
at org.elasticsearch.xpack.ccr.action.ShardFollowTasksExecutor$1.lambda$innerUpdateSettings$6(ShardFollowTasksExecutor.java:204) ~[main/:?]
[2024-03-13T13:33:54,654][WARN ][o.e.x.c.a.ShardFollowNodeTask] [follower3] shard follow task encounter non-retryable error
org.elasticsearch.index.IndexNotFoundException: no such index [index1]
at org.elasticsearch.cluster.metadata.Metadata.getIndexSafe(Metadata.java:1238) ~[elasticsearch-8.12.3-SNAPSHOT.jar:8.12.3-SNAPSHOT]
at org.elasticsearch.xpack.ccr.action.ShardFollowTasksExecutor$1.lambda$innerUpdateAliases$8(ShardFollowTasksExecutor.java:284) ~[main/:?]
see: https://gradle-enterprise.elastic.co/s/qzwwh6utni65q/tests/task/:x-pack:plugin:ccr:internalClusterTest/details/org.elasticsearch.xpack.ccr.RestartIndexFollowingIT/testFollowIndex/1/output
There is nothing in leader logs indicating that index was deleted. The leader node was restarting in meanwhile. Is there a way we could return incomplete/empty or a cluster state from a node with no master during this period of time?
The failure is likely happening as we see the unrecovered cluster state that does not contain the index we follow. This is a bug in ShardFollowNodeTask we need to fix.