OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit is flaky

Open sohami opened this issue 2 years ago • 5 comments

Describe the bug The org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit is flaky. It was unmuted as part of https://github.com/opensearch-project/OpenSearch/pull/8931:

org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitRefresh

java.lang.AssertionError: expected:<7> but was:<6>
	at __randomizedtesting.SeedInfo.seed([EA36272CF1AD08E7:82B0311AC996231A]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimary(RemoteIndexShardTests.java:139)
	at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit(RemoteIndexShardTests.java:79)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit" -Dtests.seed=EA36272CF1AD08E7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=lt-LT -Dtests.timezone=America/Argentina/San_Luis -Druntime.java=20

Expected behavior Test must always pass

Plugins Standard

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

CI Additional context https://build.ci.opensearch.org/job/gradle-check/23645/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit/

sohami avatar Aug 28 '23 20:08 sohami

This appears to be related to #9624. I think all of the 5 tests are flaky due to testNRTReplicaWithRemoteStorePromotedAsPrimary being flaky (the other tests consume this for the validation). Fixing testNRTReplicaWithRemoteStorePromotedAsPrimary to ensure it isn't flaky should likely resolve the others.

BhumikaSaini-Amazon avatar Sep 05 '23 05:09 BhumikaSaini-Amazon

@Frederic-Chopin, could you pick this up for OSCI? Thanks!

sejli avatar Oct 17 '23 02:10 sejli

Sure! Thanks!

Frederic-Chopin avatar Oct 17 '23 02:10 Frederic-Chopin

@Frederic-Chopin is this being worked on? If its not fixed, do we have clarity if this is targeted for 2.13? Adding back the untriaged label so we can discuss this in our triage and backlog review meeting.

rramachand21 avatar Mar 07 '24 04:03 rramachand21

[Triage - attendees 1 2 3 4 5 6 7 8] @sohami Thanks for creating this issue

peternied avatar May 01 '24 15:05 peternied

Another flaky test from this class

org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit

java.lang.AssertionError: RecoveryFailedException[[test][0]: Recovery failed from {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{} into {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} ([test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{})]; nested: RecoveryFailedException[[test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{}]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: CorruptIndexException[misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=metadata__9223372036854775800__9223372036854775804__9223372036854775804__9223372036854775806__-1039764442__9223370318105717958__1)];

https://build.ci.opensearch.org/job/gradle-check/41277/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit_2/

bowenlan-amzn avatar Jun 18 '24 23:06 bowenlan-amzn