elasticsearch
elasticsearch copied to clipboard
[CI] FieldSortIT testSortMixedFieldTypes failing
Build Scans:
- elasticsearch-periodic-platform-support #8758 / oraclelinux-9_platform-support-unix
- elasticsearch-intake #23870 / part1
Reproduction Line:
./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.sort.FieldSortIT.testSortMixedFieldTypes" -Dtests.seed=F6995E804508C6C3 -Dtests.locale=dsb -Dtests.timezone=Africa/Asmera -Druntime.java=24
Applicable branches: 8.19
Reproduces locally?: N/A
Failure History: See dashboard
Failure Message:
java.lang.AssertionError: invalid decRef call: already closed
Issue Reasons:
- [8.19] 2 failures in test testSortMixedFieldTypes (0.6% fail rate in 335 executions)
Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Test history suggests this test suite has been flaky on 8.19 (8.x) and main since the end of march
All test methods are, symptoms seem to be timeouts, java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight and "java.lang.AssertionError: invalid decRef call: already closed". Need to dig into those failures a bit more to see if they look related.
One of the very early ones from March 31st already show Leak issues related to "testSortMixedFieldTypes"
https://gradle-enterprise.elastic.co/s/aqlr4gxvmas4o
Will spot-check some others to be sure this is a main cause
All failures I spot checked had either a Leak message with access in "testSortMixedFieldTypes" or a java.lang.IllegalStateException: Attempted to close with partial reduce in-flight in "testSortMixedFieldTypes" in the logs somewhere, e.g. here
This failed in my PR: https://buildkite.com/elastic/elasticsearch-pull-request/builds/78601#0197cb1c-8cc8-48f0-8bca-30fdeeebd656
./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.sort.FieldSortIT.testSortMixedFieldTypes" -Dtests.seed=43DA5706726F932E -Dtests.locale=ff-SN -Dtests.timezone=Asia/Manila -Druntime.java=24
Trace:
FieldSortIT > testSortMixedFieldTypes FAILED
| java.lang.AssertionError:
| Expected: an empty collection
| but: <[LEAK: resource was not cleaned up before it was garbage-collected.
| Recent access records:
| #1:
| in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
| org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
| org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
| org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
| java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
| java.base/java.lang.Thread.run(Thread.java:1447)
| #2:
| in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
| org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47)
| org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:160)
| org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507)
| org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:182)
| org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270)
| org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34)
| org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19)
| org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63)
| org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27)
| org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49)
| org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592)
| org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1718)
| org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1496)
| org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1594)
| org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1568)
| org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35)
| org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33)
| org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)
| org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198)
| org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
| org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
| org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
| java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
| java.base/java.lang.Thread.run(Thread.java:1447)
| #3:
| in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
| org.elasticsearch.core.Releasables.close(Releasables.java:39)
| org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)
| org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)
| org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)
| org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)
| org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)
| org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)
| org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:368)
| org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)
| org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115)
| org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:904)
| org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
| org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
| org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
| java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
| java.base/java.lang.Thread.run(Thread.java:1447)
| #4:
| in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
| org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:901)
| org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
| org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
| org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
| java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
| java.base/java.lang.Thread.run(Thread.java:1447)
| Created at:
| in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
| org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110)
| org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342)
| org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1820)
| org.elasticsearch.search.SearchService.createContext(SearchService.java:1330)
| org.elasticsearch.search.MockSearchService.createContext(MockSearchService.java:142)
| org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:869)
| org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
| org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
| org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
| org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
| org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
| java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
| java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
| java.base/java.lang.Thread.run(Thread.java:1447)]>
| at __randomizedtesting.SeedInfo.seed([43DA5706726F932E:19AFEDF12748A0C6]:0)
| at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
| at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
| at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2666)
| at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:816)
| at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:610)
| at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
| at java.base/java.lang.reflect.Method.invoke(Method.java:565)
| at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
| at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at org.junit.rules.RunRules.evaluate(RunRules.java:20)
| at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
| at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
| at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
| at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
| at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
| at org.junit.rules.RunRules.evaluate(RunRules.java:20)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
| at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
| at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
| at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
| at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
| at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
| at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
| at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
| at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
| at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
| at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
| at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
| at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
| at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
| at org.junit.rules.RunRules.evaluate(RunRules.java:20)
| at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
| at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
| at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
| at java.base/java.lang.Thread.run(Thread.java:1447)
|
<br class="Apple-interchange-newline">
This has been muted on branch main
Mute Reasons:
- [main] 2 failures in test testSortMixedFieldTypes (0.3% fail rate in 654 executions)
Build Scans:
I spent some more time with this on Friday but unfortunately haven't gotten closer to a reproduction or solution. Some findings and a few theories I haven't been able to prove or disprove though:
- I increased logging SearchQueryResult reference counting in “normal” non failing cases but the locations where we inc/decRef look the same as in the failure cases (i.e. )
Leaks
java.lang.AssertionError:
Expected: an empty collection
but: <[LEAK: resource was not cleaned up before it was garbage-collected.
Recent access records:
#1:
in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
java.base/java.lang.Thread.run(Thread.java:1447)
#2:
in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]
org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47)
org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:160)
org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507)
org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:182)
org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270)
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34)
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19)
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63)
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27)
org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49)
org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592)
org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1718)
org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1496)
org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1594)
org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1568)
org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35)
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33)
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)
org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198)
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
java.base/java.lang.Thread.run(Thread.java:1447)
#3:
in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]
org.elasticsearch.core.Releasables.close(Releasables.java:39)
org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)
org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)
org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)
org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)
org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)
org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)
org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:368)
org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)
org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:904)
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
java.base/java.lang.Thread.run(Thread.java:1447)
#4:
in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:901)
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
java.base/java.lang.Thread.run(Thread.java:1447)
Created at:
in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]
org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110)
org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342)
org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1820)
org.elasticsearch.search.SearchService.createContext(SearchService.java:1330)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:869)
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
java.base/java.lang.Thread.run(Thread.java:1447)]>
at __randomizedtesting.SeedInfo.seed([26C0313418883E0F:7CB58BC34DAF0DE7]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2710)
at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:860)
at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:654)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.lang.reflect.Method.invoke(Method.java:565)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.lang.Thread.run(Thread.java:1447)
- tried to (unsuccessfully) reproduce locally
- investigated role of expected “IllegalArgumentExceptions” on invalid sort
- the failing test expects searches to fail because of incompatible sort types (i.e. long vs. double). The IEA is thrown in SearchPhaseController#validateSameSortTypeAndMaybeRewrite I think should be caught and handled in the SearchQueryThenFetchAsyncAction#onShardDone. However in local test I ran I occasionally saw these messsages logged in the Error logger (most likely this handler with an additional "failed to merge on data node"). This doesn't show up in the CI logs to my knowledge so might be a red hering
- investigated role of MockSearchService showing up in some failure cases, doesn’t seem to be the issue though because there are failures with it in the object creation path as well and forcing all tests to use MocksearchService doesn't increase likelihood of failure
Its important to note that the begining of this failure correlates with the introduction of "batched query execution", so my current working theory is that we run into a specific edge case with these search failures and maybe have a race somewhere between error handling and closing and cleaning up resources.
One other thing thats maybe worth noting:
I tried increasing the number of documents used in the test (currently the test only indexes one doc per sort type) to e.g. 20 which largely increases the likelihood of this test failing. However, this is with a different error reason than the Leak/"Attempted to close with partial reduce in-flight" messages:
java.lang.AssertionError: [org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.search.TransportSearchAction$SearchResponseActionListener/
[...]
Caused by: java.lang.AssertionError: [org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.search.TransportSearchAction$SearchResponseActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture@21383b4e}{CancellableTask{Task{id=329, type='transport', action='indices:data/read/search', description='null', parentTask=unset, startTime=1751890516860, headers={}, startTimeNanos=17299138524750}, reason='Fatal failure during search: failed to merge result [Can't sort on field [foo]; the field has incompatible sort types: [DOUBLE] and [LONG] across shards!]', isCancelled=true}}/org.elasticsearch.action.support.TransportAction$$Lambda/0x00001c00018e55a0@42058342/org.elasticsearch.action.search.AbstractSearchAsyncAction$$Lambda/0x00001c0001b16c28@6be94945]
... 42 more
Caused by: org.elasticsearch.ElasticsearchException: executed already
at org.elasticsearch.action.ActionListener$3.assertFirstRun(ActionListener.java:403)
... 41 more
This is coming from an assertion in ActionListener
Here's a local run of 10 with 4 of these failure: https://gradle-enterprise.elastic.co/s/oiufje2arvhla
This has been muted on branch main
Unmuted in #130755 to collect more information on failure circumstances
Latest one today on main https://gradle-enterprise.elastic.co/s/inmlbv6wwdjju/tests/overview?outcome=FAILED
Two more on main today:
https://gradle-enterprise.elastic.co/s/3nbviyla7vzbg/tests/task/:server:internalClusterTest/details/org.elasticsearch.search.sort.FieldSortIT/testFieldAliasesWithMissingValues?top-execution=1
https://gradle-enterprise.elastic.co/s/ddpnsvs4mcj5k/tests/task/:server:internalClusterTest/details/org.elasticsearch.search.sort.FieldSortIT/testIssue8226?top-execution=1
In order to understand the race condition this apparently is I think we need logging about were exactly the expected exceptions for the invalid sort types are happening (i.e. which nodes). Maybe with that understanding we can reproduce the issue better. Will open a PR shortly.
New failure, now with some additional logging: https://gradle-enterprise.elastic.co/s/biqntwh6sjaew/tests/overview
Some info from the last failure that might help reduce score a bit:
Cluster the suite was run on:
[2025-07-21T17:36:55,097][INFO ][o.e.t.InternalTestCluster][testSortMissingNumbers] Setup InternalTestCluster [SUITE-TEST_WORKER_VM=[435]-CLUSTER_SEED=[7000333836992015345]-HASH=[11F7FEEE5A7]-cluster] with seed [61262E890F0E6BF1] using [0] dedicated masters, [2] (data) nodes and [0] coord only nodes (master nodes are [auto-managed])
“testSortMixedFieldTypes” later times out because it got killed with
java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
at __randomizedtesting.SeedInfo.seed([40715FAB90036405]:0)
at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:74)
at org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)
at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)
at org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)
at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)
at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:352)
at org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$new$0(AbstractSearchAsyncAction.java:148)
at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onFailure(ActionListenerImplementations.java:356)
at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:89)
at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:423)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:647)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:624)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onPhaseFailure(SearchQueryThenFetchAsyncAction.java:81)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction$1.handleException(SearchQueryThenFetchAsyncAction.java:526)
[...]
Caused by:
java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
at org.elasticsearch.action.search.QueryPhaseResultConsumer.doClose(QueryPhaseResultConsumer.java:144)
at org.elasticsearch.action.search.ArraySearchPhaseResults.close(ArraySearchPhaseResults.java:55)
at org.elasticsearch.core.Releasables.close(Releasables.java:39)
at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)
... 52 more
The Attempted to close with partial reduce in-flight errors are something already seen before, maybe they are easier to explain than the Leaks but might have same reason.
From the recently added logs we can see the docs for the indices that get provoked so IllegalArgument sorting failures are living on these same primary node. Unfortunately some of the replicas of those shards were not created when logged but we can also see the shard/replica counts for “index_long” (4/1), “index_double” (5/1) and “index_keyword” (4/0), so maybe those can be focused on for local reproduction:
[2025-07-21T17:37:04,758][INFO ][o.e.s.s.FieldSortIT ][testSortMixedFieldTypes] FieldSortIT#testSortMixedFieldTypes document distribution: [{"index":"index_long","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_long","node":null,"shard":"0","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"1","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"2","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"3","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_double","node":"node_s0","shard":"0","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"1","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"2","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"3","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"4","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_double","node":null,"shard":"4","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_keyword","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_keyword","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_keyword","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_keyword","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"}]
The IAE are happening then on “node_s1” for the Long/Double sorting mismatch
1> [2025-07-21T17:37:04,817][DEBUG][o.e.a.s.SearchPhaseController][node_s1][search][T#2] Failed to merge top docs:
1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [DOUBLE] across shards!
and are bubbling up to “node_s0” as this appears to be the coordinator:
1> [2025-07-21T17:37:04,820][DEBUG][o.e.a.s.SearchQueryThenFetchAsyncAction][node_s0][transport_worker][T#1] handling node search exception coming from [g0JK2l3NQFiWjOLoNm2GPQ]
1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [DOUBLE] across shards!
Same seems to be true for the Long/String sorting mismatch:
1> [2025-07-21T17:37:04,836][DEBUG][o.e.a.s.SearchPhaseController][node_s1][search][T#2] Failed to merge top docs:
1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [STRING] across shards!
1> at org.elasticsearch.action.search.SearchPhaseController.validateSameSortTypesAndMaybeRewrite(SearchPhaseController.java:219) ~[main/:?]
1> [2025-07-21T17:37:04,838][DEBUG][o.e.a.s.SearchQueryThenFetchAsyncAction][node_s0][transport_worker][T#2] handling node search exception coming from [g0JK2l3NQFiWjOLoNm2GPQ]
1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [STRING] across shards!
1> at org.elasticsearch.action.search.SearchPhaseController.validateSameSortTypesAndMaybeRewrite(SearchPhaseController.java:219) ~[main/:?]
1> at org.elasticsearch.action.search.SearchPhaseController.mergeTopDocs(SearchPhaseController.java:167) ~[main/:?]
The final AssertionError is coming from “node_s0” again, after the second failure, so I assume its that later error handling that might have gone wrong somehow:
1> [2025-07-21T17:37:04,840][ERROR][o.e.ExceptionsHelper ][node_s0][transport_worker][T#2] fatal error java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
1> at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:449)
1> at java.base/java.util.Optional.ifPresent(Optional.java:178)
1> at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:439)
1> at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.exceptionCaught(Netty4MessageInboundHandler.java:63)
1> at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
1> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:447)
Another more recent one.
This has been muted on branch 9.1
Mute Reasons:
- [9.1] 2 failures in test testSortMixedFieldTypes (0.8% fail rate in 262 executions)
Build Scans:
A similar failure was raised in my PR as well ->
java.lang.AssertionError: |
-- | --
Expected: an empty collection |
but: <[LEAK: resource was not cleaned up before it was garbage-collected. |
Recent access records: |
#1: |
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |
java.base/java.lang.Thread.run(Thread.java:1474) |
#2: |
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |
org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47) |
org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:161) |
org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507) |
org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:192) |
org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270) |
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34) |
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19) |
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63) |
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27) |
org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49) |
org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592) |
org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1726) |
org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1504) |
org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1602) |
org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1576) |
org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35) |
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33) |
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20) |
org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198) |
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |
java.base/java.lang.Thread.run(Thread.java:1474) |
#3: |
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |
org.elasticsearch.core.Releasables.close(Releasables.java:39) |
org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72) |
org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51) |
org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218) |
org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413) |
org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:406) |
org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:326) |
org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:371) |
org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:262) |
org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115) |
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:944) |
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |
java.base/java.lang.Thread.run(Thread.java:1474) |
#4: |
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:941) |
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |
java.base/java.lang.Thread.run(Thread.java:1474) |
Created at: |
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |
org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110) |
org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342) |
org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1863) |
org.elasticsearch.search.SearchService.createContext(SearchService.java:1373) |
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:898) |
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |
java.base/java.lang.Thread.run(Thread.java:1474)]>
</div></div>
This has been muted on branch main
Mute Reasons:
- [main] 2 failures in test testSortMixedFieldTypes (0.2% fail rate in 879 executions)
Build Scans:
Revisiting, this is probably also related to the phase failures issues in batched execution. So we should try to get that in first and then see if the failures here are resolved then.
After https://github.com/elastic/elasticsearch/pull/136889 has been merged I would like to unmute this test and see if it still fails occasionally, wdyt @benchaplin? Let me know on https://github.com/elastic/elasticsearch/pull/137724 if you think that we should try this.
I'd expect this one to be solved by #137698.
My reasoning is the leak trace:
- QuerySearchResult created (refs = 1)
- explicit incRef in SearchService.executeQueryPhase (+1, refs = 2)
- SearchContext.close (-1, refs = 1)
- ArraySearchPhaseResults.consumeResult (+1, refs = 2)
- ActionListener.respondAndRelease - once SearchService.executeQueryPhase is complete (-1, refs = 1)
What's missing is the corresponding release from ArraySearchPhaseResults. We get this from closing the QueryPhaseResultConsumer on the data node.
^Scratch that, I'm a bit mixed up. I think you're right, this leak bug should be solved by https://github.com/elastic/elasticsearch/pull/136889.
I unmuted the test, leaving this open just to check back in a couple of days if we see the errors again.