OpenSearch
OpenSearch copied to clipboard
[CI] o.o.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery
Failed on unrelated PR #1742. Not reproducible locally. Opening to track if this continues to fail.
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery" -Dtests.seed=1183E5842BAA4635 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m -Djava.security.manager=allow" -Dtests.locale=ja-JP -Dtests.timezone=Pacific/Kwajalein -Druntime.java=17
org.opensearch.gateway.RecoveryFromGatewayIT > testReuseInFileBasedPeerRecovery FAILED
java.lang.AssertionError: shard [test][0] on node [node_t1] has pending operations:
--> RetentionLeaseBackgroundSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=1468, leases={peer_recovery/_23_6236SQekFQ4X2S5HWQ=RetentionLease{id='peer_recovery/_23_6236SQekFQ4X2S5HWQ', retainingSequenceNumber=1333, timestamp=1639665151286, source='peer recovery'}, peer_recovery/txItgQoGQoyJSR_SvgGAIQ=RetentionLease{id='peer_recovery/txItgQoGQoyJSR_SvgGAIQ', retainingSequenceNumber=1333, timestamp=1639665151286, source='peer recovery'}}}, shardId=[test][0], timeout=1m, index='test', waitForActiveShards=0}
at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:248)
at org.opensearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3231)
at org.opensearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:1117)
at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:434)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at org.opensearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:378)
at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:91)
at org.opensearch.transport.TransportService$8.doRun(TransportService.java:944)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
at __randomizedtesting.SeedInfo.seed([1183E5842BAA4635:471B0010ABF29B6E]:0)
at org.opensearch.test.InternalTestCluster.lambda$assertNoPendingIndexOperations$12(InternalTestCluster.java:1434)
at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1060)
at org.opensearch.test.InternalTestCluster.assertNoPendingIndexOperations(InternalTestCluster.java:1421)
at org.opensearch.test.InternalTestCluster.beforeIndexDeletion(InternalTestCluster.java:1349)
at org.opensearch.test.OpenSearchIntegTestCase.beforeIndexDeletion(OpenSearchIntegTestCase.java:636)
https://github.com/opensearch-project/OpenSearch/pull/2069#issuecomment-1032857800
It doesn't look like we've referenced this flakey test failure after April. But that said, I could not find any explicit fixes for this test that would suggest that this issue has been resolved. Should we close this issue and assume the issue has fixed itself along the way?
The real friends are the tests we made along the way ;)
I'm for shooting it and seeing if it reappears, but interested to hear what other folks think.
Ran this test 1000 times in isolation, was not able to reproduce. Closing as there have been no occurrences since April
./gradlew ':server:internalClusterTest' --tests "org.opensearch.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery" -Dtests.seed=1183E5842BAA4635 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m -Djava.security.manager=allow" -Dtests.locale=ja-JP -Dtests.timezone=Pacific/Kwajalein -Dtests.iters=1000
> Configure project :qa:os
Cannot add task 'destructiveDistroTest.docker' as a task with that name already exists.
=======================================
OpenSearch Build Hamster says Hello!
Gradle Version : 7.6
OS Info : Linux 5.4.225-139.416.amzn2int.x86_64 (amd64)
JDK Version : 17 (OpenJDK)
JAVA_HOME : /opt/jdk-17
Random Testing Seed : 1183E5842BAA4635
In FIPS 140 mode : false
=======================================
> Task :server:internalClusterTest
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.BootstrapForTesting (file:/local/home/jpalis/repos/flaky-tests/OpenSearch/test/framework/build/distributions/framework-3.0.0-SNAPSHOT.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.BootstrapForTesting
WARNING: System::setSecurityManager will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/local/home/jpalis/.gradle/wrapper/dists/gradle-7.6-all/9f832ih6bniajn45pbmqhk2cw/gradle-7.6/lib/plugins/gradle-testing-base-7.6.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release
BUILD SUCCESSFUL in 17m 56s
43 actionable tasks: 1 executed, 42 up-to-date