elasticsearch
elasticsearch copied to clipboard
[CI] SearchableSnapshotsCanMatchOnCoordinatorIntegTests testSearchableSnapshotShardsThatHaveMatchingDataAreNotSkippedOnTheCoordinatingNode failing
This test doesn't fail often but there have been 8 fails in history starting on the 27th of December 2023 (I checked 90 days).
Build scan: https://gradle-enterprise.elastic.co/s/37emgjkj2kabs/tests/:x-pack:plugin:searchable-snapshots:internalClusterTest/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsCanMatchOnCoordinatorIntegTests/testSearchableSnapshotShardsThatHaveMatchingDataAreNotSkippedOnTheCoordinatingNode
Reproduction line:
./gradlew ':x-pack:plugin:searchable-snapshots:internalClusterTest' --tests "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsCanMatchOnCoordinatorIntegTests.testSearchableSnapshotShardsThatHaveMatchingDataAreNotSkippedOnTheCoordinatingNode" -Dtests.seed=DF618ECA1E13C7B8 -Dtests.locale=ko -Dtests.timezone=Canada/Saskatchewan -Druntime.java=21
Applicable branches: main
Reproduces locally?: Didn't try
Failure excerpt:
java.lang.AssertionError: no shard should be marked as skipped
at __randomizedtesting.SeedInfo.seed([DF618ECA1E13C7B8:1C94DFDF97D14973]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertFalse(Assert.java:65)
at org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsCanMatchOnCoordinatorIntegTests.testSearchableSnapshotShardsThatHaveMatchingDataAreNotSkippedOnTheCoordinatingNode(SearchableSnapshotsCanMatchOnCoordinatorIntegTests.java:667)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.lang.reflect.Method.invoke(Method.java:580)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.lang.Thread.run(Thread.java:1583)
Pinging @elastic/es-distributed (Team:Distributed)
I've muted this in https://github.com/elastic/elasticsearch/commit/4b5f0a0bb7eac6f807876a96bd5a350c505907b8.
The failing SearchShards API assertion was added in #97212, so I'm going to reassign the test failure to the Search team.
Pinging @elastic/es-search (Team:Search)
Few observation from time spent debugging this test
- Repository is blocked before searchable snapshot is mounted. Reading of blocking flag happens after. Thread synchronization is correct as for me.
- search request (first and then second) can run concurrently with mounting snapshot operation which runs in background
- mounting snapshot operation cannot complete because of locked repository, therefore two search requests are expected to fail
- Nevertheless this condition looks suspicious
{
SearchShardsResponse searchShardsResponse = null;
try {
searchShardsResponse = client().execute(TransportSearchShardsAction.TYPE, searchShardsRequest).actionGet();
} catch (SearchPhaseExecutionException e) {
// ignore as this is expected to happen
}
if (searchShardsResponse != null) {
for (SearchShardsGroup group : searchShardsResponse.getGroups()) {
assertFalse("no shard should be marked as skipped", group.skipped());
}
}
}
somehow searchShardsResponse
got returned but it is expected to contain no skipped groups. Here I'd like ES-Search team to explain how/why is can happen that searchShardsResponse
contains some results despite the fact that underlying shard(s) do not exists at time of query
Possibly related failure: https://gradle-enterprise.elastic.co/s/5o5anthxsfi6k/tests/task/:x-pack:plugin:searchable-snapshots:internalClusterTest/details/org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshotsCanMatchOnCoordinatorIntegTests/testSearchableSnapshotShardsThatHaveMatchingDataAreNotSkippedOnTheCoordinatingNode?page=eyJvdXRwdXQiOnsiMCI6MX19&top-execution=1
Test is the same, but the exception is different. Let me know if it's worth creating a separate issue for it
Pinging @elastic/es-search-foundations (Team:Search Foundations)