OpenSearch
OpenSearch copied to clipboard
[BUG] org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit is flaky
Describe the bug The org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit is flaky. It was unmuted as part of https://github.com/opensearch-project/OpenSearch/pull/8931:
org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitRefresh
java.lang.AssertionError: expected:<7> but was:<6>
at __randomizedtesting.SeedInfo.seed([EA36272CF1AD08E7:82B0311AC996231A]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimary(RemoteIndexShardTests.java:139)
at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit(RemoteIndexShardTests.java:79)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:1623)
To Reproduce
REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit" -Dtests.seed=EA36272CF1AD08E7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=lt-LT -Dtests.timezone=America/Argentina/San_Luis -Druntime.java=20
Expected behavior Test must always pass
Plugins Standard
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
CI Additional context https://build.ci.opensearch.org/job/gradle-check/23645/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit/
This appears to be related to #9624. I think all of the 5 tests are flaky due to testNRTReplicaWithRemoteStorePromotedAsPrimary being flaky (the other tests consume this for the validation). Fixing testNRTReplicaWithRemoteStorePromotedAsPrimary to ensure it isn't flaky should likely resolve the others.
@Frederic-Chopin, could you pick this up for OSCI? Thanks!
Sure! Thanks!
@Frederic-Chopin is this being worked on? If its not fixed, do we have clarity if this is targeted for 2.13? Adding back the untriaged label so we can discuss this in our triage and backlog review meeting.
Another flaky test from this class
org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit
java.lang.AssertionError: RecoveryFailedException[[test][0]: Recovery failed from {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{} into {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} ([test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{})]; nested: RecoveryFailedException[[test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{}]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: CorruptIndexException[misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=metadata__9223372036854775800__9223372036854775804__9223372036854775804__9223372036854775806__-1039764442__9223370318105717958__1)];
https://build.ci.opensearch.org/job/gradle-check/41277/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit_2/