OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot is flaky

Open ashking94 opened this issue 2 years ago • 8 comments
trafficstars

Describe the bug org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot test is flaky on main branch. I ran the test on loop and it failed on the 15th iteration itself.

To Reproduce The same seed is not always reproducing the failure. To reproduce, kindly run the test on loop and wait for the test to fail.

Expected behavior The test should pass.

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context Jenkins build failure link - https://build.ci.opensearch.org/job/gradle-check/21871/

ashking94 avatar Aug 04 '23 10:08 ashking94

@kasundra07 @harishbhakuni21 fyi

ashking94 avatar Aug 04 '23 10:08 ashking94

Not able to reproduce failure in local even after 1000 attempts. Closing

sachinpkale avatar Sep 06 '23 13:09 sachinpkale

Reopening this as again seeing this test failing:

Ref CI: https://build.ci.opensearch.org/job/gradle-check/25984/

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=5A77171FC14EEBF7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=vi -Dtests.timezone=PRC -Druntime.java=20
java.lang.AssertionError: 
Expected: is <7>
     but: was <4>
	at __randomizedtesting.SeedInfo.seed([5A77171FC14EEBF7:5F34FA1617FAB2A9]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:489)
	at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:85)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

sohami avatar Sep 21 '23 16:09 sohami

@harishbhakuni Can you take a look at this ?

sohami avatar Sep 21 '23 16:09 sohami

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
	at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
	at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
	... 7 more

andrross avatar Oct 04 '23 21:10 andrross

> metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock

This issue is fixed with this PR: https://github.com/opensearch-project/OpenSearch/issues/10217

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
	at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
	at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
	at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
	at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
	... 7 more

This one i didn't see before.. some uuid generation issue looks like. let me check this one.

harishbhakuni avatar Oct 04 '23 22:10 harishbhakuni

[Triage - attendees 1 2 3 4 5 6 7] Looks like this still might be an issue, reopening so it is investigated

peternied avatar Apr 24 '24 15:04 peternied

From the other issue: https://build.ci.opensearch.org/job/gradle-check/37349/testReport/

java.lang.AssertionError: 
Expected: is <9>
     but: was <8>
	at __randomizedtesting.SeedInfo.seed([30BCA240DC8694B0:35FF4F490A32CDEE]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:497)
	at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:92)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=30BCA240DC8694B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fr-CA -Dtests.timezone=Antarctica/Vostok -Druntime.java=21

peternied avatar Apr 24 '24 15:04 peternied