elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[CI] FieldSortIT testSortMixedFieldTypes failing

Open elasticsearchmachine opened this issue 5 months ago • 1 comments
trafficstars

Build Scans:

Reproduction Line:

./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.sort.FieldSortIT.testSortMixedFieldTypes" -Dtests.seed=F6995E804508C6C3 -Dtests.locale=dsb -Dtests.timezone=Africa/Asmera -Druntime.java=24

Applicable branches: 8.19

Reproduces locally?: N/A

Failure History: See dashboard

Failure Message:

java.lang.AssertionError: invalid decRef call: already closed

Issue Reasons:

  • [8.19] 2 failures in test testSortMixedFieldTypes (0.6% fail rate in 335 executions)

Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

elasticsearchmachine avatar Jun 14 '25 05:06 elasticsearchmachine

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine avatar Jun 14 '25 05:06 elasticsearchmachine

Test history suggests this test suite has been flaky on 8.19 (8.x) and main since the end of march

Image

All test methods are, symptoms seem to be timeouts, java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight and "java.lang.AssertionError: invalid decRef call: already closed". Need to dig into those failures a bit more to see if they look related.

cbuescher avatar Jun 27 '25 13:06 cbuescher

One of the very early ones from March 31st already show Leak issues related to "testSortMixedFieldTypes"

https://gradle-enterprise.elastic.co/s/aqlr4gxvmas4o

Will spot-check some others to be sure this is a main cause

cbuescher avatar Jun 27 '25 13:06 cbuescher

All failures I spot checked had either a Leak message with access in "testSortMixedFieldTypes" or a java.lang.IllegalStateException: Attempted to close with partial reduce in-flight in "testSortMixedFieldTypes" in the logs somewhere, e.g. here

cbuescher avatar Jun 27 '25 13:06 cbuescher

This failed in my PR: https://buildkite.com/elastic/elasticsearch-pull-request/builds/78601#0197cb1c-8cc8-48f0-8bca-30fdeeebd656

 ./gradlew ":server:internalClusterTest" --tests "org.elasticsearch.search.sort.FieldSortIT.testSortMixedFieldTypes" -Dtests.seed=43DA5706726F932E -Dtests.locale=ff-SN -Dtests.timezone=Asia/Manila -Druntime.java=24

Trace:

FieldSortIT > testSortMixedFieldTypes FAILED
  | java.lang.AssertionError:
  | Expected: an empty collection
  | but: <[LEAK: resource was not cleaned up before it was garbage-collected.
  | Recent access records:
  | #1:
  | in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
  | org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
  | org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
  | org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
  | java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
  | java.base/java.lang.Thread.run(Thread.java:1447)
  | #2:
  | in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
  | org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47)
  | org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:160)
  | org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507)
  | org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:182)
  | org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270)
  | org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34)
  | org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19)
  | org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63)
  | org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27)
  | org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49)
  | org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592)
  | org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1718)
  | org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1496)
  | org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1594)
  | org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1568)
  | org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35)
  | org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33)
  | org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)
  | org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198)
  | org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
  | org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
  | org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
  | java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
  | java.base/java.lang.Thread.run(Thread.java:1447)
  | #3:
  | in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
  | org.elasticsearch.core.Releasables.close(Releasables.java:39)
  | org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)
  | org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)
  | org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)
  | org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)
  | org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)
  | org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)
  | org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:368)
  | org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)
  | org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115)
  | org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:904)
  | org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
  | org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
  | org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
  | java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
  | java.base/java.lang.Thread.run(Thread.java:1447)
  | #4:
  | in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
  | org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:901)
  | org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
  | org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
  | org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
  | java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
  | java.base/java.lang.Thread.run(Thread.java:1447)
  | Created at:
  | in [elasticsearch[node_s2][search][T#3]][testSortMixedFieldTypes]
  | org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110)
  | org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342)
  | org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1820)
  | org.elasticsearch.search.SearchService.createContext(SearchService.java:1330)
  | org.elasticsearch.search.MockSearchService.createContext(MockSearchService.java:142)
  | org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:869)
  | org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)
  | org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)
  | org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)
  | org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)
  | org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
  | java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
  | java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
  | java.base/java.lang.Thread.run(Thread.java:1447)]>
  | at __randomizedtesting.SeedInfo.seed([43DA5706726F932E:19AFEDF12748A0C6]:0)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  | at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2666)
  | at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:816)
  | at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:610)
  | at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  | at java.base/java.lang.reflect.Method.invoke(Method.java:565)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  | at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  | at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  | at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  | at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  | at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  | at java.base/java.lang.Thread.run(Thread.java:1447)
  |  

<br class="Apple-interchange-newline">

benwtrent avatar Jul 02 '25 13:07 benwtrent

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test testSortMixedFieldTypes (0.3% fail rate in 654 executions)

Build Scans:

elasticsearchmachine avatar Jul 06 '25 13:07 elasticsearchmachine

I spent some more time with this on Friday but unfortunately haven't gotten closer to a reproduction or solution. Some findings and a few theories I haven't been able to prove or disprove though:

  • I increased logging SearchQueryResult reference counting in “normal” non failing cases but the locations where we inc/decRef look the same as in the failure cases (i.e. )
Leaks
java.lang.AssertionError: 	
Expected: an empty collection	
   but: <[LEAK: resource was not cleaned up before it was garbage-collected.	
Recent access records: 	
#1:	
	in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]	
	org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)	
	org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)	
	org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	java.base/java.lang.Thread.run(Thread.java:1447)	
#2:	
	in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]	
	org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47)	
	org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:160)	
	org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507)	
	org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:182)	
	org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270)	
	org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34)	
	org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19)	
	org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63)	
	org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27)	
	org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49)	
	org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592)	
	org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1718)	
	org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1496)	
	org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1594)	
	org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1568)	
	org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35)	
	org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33)	
	org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)	
	org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198)	
	org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)	
	org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)	
	org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	java.base/java.lang.Thread.run(Thread.java:1447)	
#3:	
	in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]	
	org.elasticsearch.core.Releasables.close(Releasables.java:39)	
	org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)	
	org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)	
	org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)	
	org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)	
	org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)	
	org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)	
	org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:368)	
	org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)	
	org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115)	
	org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:904)	
	org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)	
	org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)	
	org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	java.base/java.lang.Thread.run(Thread.java:1447)	
#4:	
	in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]	
	org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:901)	
	org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)	
	org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)	
	org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	java.base/java.lang.Thread.run(Thread.java:1447)	
Created at:	
	in [elasticsearch[node_s2][search][T#2]][testSortMixedFieldTypes]	
	org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110)	
	org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342)	
	org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1820)	
	org.elasticsearch.search.SearchService.createContext(SearchService.java:1330)	
	org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:869)	
	org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:710)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79)	
	org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76)	
	org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35)	
	org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044)	
	org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)	
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)	
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)	
	java.base/java.lang.Thread.run(Thread.java:1447)]>	
	

at __randomizedtesting.SeedInfo.seed([26C0313418883E0F:7CB58BC34DAF0DE7]:0)	
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)	
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)	
	at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2710)	
	at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:860)	
	at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:654)	
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)	
	at java.lang.reflect.Method.invoke(Method.java:565)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)	
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)	
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)	
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)	
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)	
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)	
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)	
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)	
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)	
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)	
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)	
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)	
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)	
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)	
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)	
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)	
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)	
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)	
	at java.lang.Thread.run(Thread.java:1447)
  • tried to (unsuccessfully) reproduce locally
  • investigated role of expected “IllegalArgumentExceptions” on invalid sort
  • investigated role of MockSearchService showing up in some failure cases, doesn’t seem to be the issue though because there are failures with it in the object creation path as well and forcing all tests to use MocksearchService doesn't increase likelihood of failure

Its important to note that the begining of this failure correlates with the introduction of "batched query execution", so my current working theory is that we run into a specific edge case with these search failures and maybe have a race somewhere between error handling and closing and cleaning up resources.

One other thing thats maybe worth noting:

I tried increasing the number of documents used in the test (currently the test only indexes one doc per sort type) to e.g. 20 which largely increases the likelihood of this test failing. However, this is with a different error reason than the Leak/"Attempted to close with partial reduce in-flight" messages:

java.lang.AssertionError: [org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.search.TransportSearchAction$SearchResponseActionListener/
[...]
Caused by: java.lang.AssertionError: [org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.search.TransportSearchAction$SearchResponseActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{org.elasticsearch.client.internal.support.AbstractClient$RefCountedFuture@21383b4e}{CancellableTask{Task{id=329, type='transport', action='indices:data/read/search', description='null', parentTask=unset, startTime=1751890516860, headers={}, startTimeNanos=17299138524750}, reason='Fatal failure during search: failed to merge result [Can't sort on field [foo]; the field has incompatible sort types: [DOUBLE] and [LONG] across shards!]', isCancelled=true}}/org.elasticsearch.action.support.TransportAction$$Lambda/0x00001c00018e55a0@42058342/org.elasticsearch.action.search.AbstractSearchAsyncAction$$Lambda/0x00001c0001b16c28@6be94945]
	... 42 more
Caused by: org.elasticsearch.ElasticsearchException: executed already
	at org.elasticsearch.action.ActionListener$3.assertFirstRun(ActionListener.java:403)
	... 41 more

This is coming from an assertion in ActionListener

Here's a local run of 10 with 4 of these failure: https://gradle-enterprise.elastic.co/s/oiufje2arvhla

cbuescher avatar Jul 07 '25 12:07 cbuescher

This has been muted on branch main

Unmuted in #130755 to collect more information on failure circumstances

cbuescher avatar Jul 08 '25 06:07 cbuescher

Latest one today on main https://gradle-enterprise.elastic.co/s/inmlbv6wwdjju/tests/overview?outcome=FAILED

cbuescher avatar Jul 11 '25 09:07 cbuescher

Two more on main today:

https://gradle-enterprise.elastic.co/s/3nbviyla7vzbg/tests/task/:server:internalClusterTest/details/org.elasticsearch.search.sort.FieldSortIT/testFieldAliasesWithMissingValues?top-execution=1

https://gradle-enterprise.elastic.co/s/ddpnsvs4mcj5k/tests/task/:server:internalClusterTest/details/org.elasticsearch.search.sort.FieldSortIT/testIssue8226?top-execution=1

In order to understand the race condition this apparently is I think we need logging about were exactly the expected exceptions for the invalid sort types are happening (i.e. which nodes). Maybe with that understanding we can reproduce the issue better. Will open a PR shortly.

cbuescher avatar Jul 18 '25 08:07 cbuescher

New failure, now with some additional logging: https://gradle-enterprise.elastic.co/s/biqntwh6sjaew/tests/overview

cbuescher avatar Jul 22 '25 08:07 cbuescher

Some info from the last failure that might help reduce score a bit:

Cluster the suite was run on:

[2025-07-21T17:36:55,097][INFO ][o.e.t.InternalTestCluster][testSortMissingNumbers] Setup InternalTestCluster [SUITE-TEST_WORKER_VM=[435]-CLUSTER_SEED=[7000333836992015345]-HASH=[11F7FEEE5A7]-cluster] with seed [61262E890F0E6BF1] using [0] dedicated masters, [2] (data) nodes and [0] coord only nodes (master nodes are [auto-managed])

“testSortMixedFieldTypes” later times out because it got killed with

java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
        at __randomizedtesting.SeedInfo.seed([40715FAB90036405]:0)
        at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:74)
        at org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51)
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218)
        at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413)
        at org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:403)
        at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:323)
        at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:352)
        at org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:259)
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$new$0(AbstractSearchAsyncAction.java:148)
        at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onFailure(ActionListenerImplementations.java:356)
        at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
        at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:89)
        at org.elasticsearch.action.ActionListener$3.onFailure(ActionListener.java:423)
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.raisePhaseFailure(AbstractSearchAsyncAction.java:647)
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:624)
        at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onPhaseFailure(SearchQueryThenFetchAsyncAction.java:81)
        at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction$1.handleException(SearchQueryThenFetchAsyncAction.java:526)
[...]
Caused by:
        java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
            at org.elasticsearch.action.search.QueryPhaseResultConsumer.doClose(QueryPhaseResultConsumer.java:144)
            at org.elasticsearch.action.search.ArraySearchPhaseResults.close(ArraySearchPhaseResults.java:55)
            at org.elasticsearch.core.Releasables.close(Releasables.java:39)
            at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72)
            ... 52 more

The Attempted to close with partial reduce in-flight errors are something already seen before, maybe they are easier to explain than the Leaks but might have same reason.

From the recently added logs we can see the docs for the indices that get provoked so IllegalArgument sorting failures are living on these same primary node. Unfortunately some of the replicas of those shards were not created when logged but we can also see the shard/replica counts for “index_long” (4/1), “index_double” (5/1) and “index_keyword” (4/0), so maybe those can be focused on for local reproduction:

[2025-07-21T17:37:04,758][INFO ][o.e.s.s.FieldSortIT      ][testSortMixedFieldTypes] FieldSortIT#testSortMixedFieldTypes document distribution: [{"index":"index_long","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_long","node":null,"shard":"0","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"1","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"2","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_long","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_long","node":null,"shard":"3","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_double","node":"node_s0","shard":"0","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"1","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"2","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"3","prirep":"r","state":"STARTED","docs":"0"},{"index":"index_double","node":"node_s1","shard":"4","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_double","node":null,"shard":"4","prirep":"r","state":"UNASSIGNED","docs":null},{"index":"index_keyword","node":"node_s1","shard":"0","prirep":"p","state":"STARTED","docs":"1"},{"index":"index_keyword","node":"node_s0","shard":"1","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_keyword","node":"node_s1","shard":"2","prirep":"p","state":"STARTED","docs":"0"},{"index":"index_keyword","node":"node_s0","shard":"3","prirep":"p","state":"STARTED","docs":"0"}]

The IAE are happening then on “node_s1” for the Long/Double sorting mismatch

 1> [2025-07-21T17:37:04,817][DEBUG][o.e.a.s.SearchPhaseController][node_s1][search][T#2] Failed to merge top docs: 
  1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [DOUBLE] across shards!

and are bubbling up to “node_s0” as this appears to be the coordinator:

1> [2025-07-21T17:37:04,820][DEBUG][o.e.a.s.SearchQueryThenFetchAsyncAction][node_s0][transport_worker][T#1] handling node search exception coming from [g0JK2l3NQFiWjOLoNm2GPQ]
  1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [DOUBLE] across shards!

Same seems to be true for the Long/String sorting mismatch:

1> [2025-07-21T17:37:04,836][DEBUG][o.e.a.s.SearchPhaseController][node_s1][search][T#2] Failed to merge top docs: 
  1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [STRING] across shards!
  1> 	at org.elasticsearch.action.search.SearchPhaseController.validateSameSortTypesAndMaybeRewrite(SearchPhaseController.java:219) ~[main/:?]
  1> [2025-07-21T17:37:04,838][DEBUG][o.e.a.s.SearchQueryThenFetchAsyncAction][node_s0][transport_worker][T#2] handling node search exception coming from [g0JK2l3NQFiWjOLoNm2GPQ]
  1> java.lang.IllegalArgumentException: Can't sort on field [foo]; the field has incompatible sort types: [LONG] and [STRING] across shards!
  1> 	at org.elasticsearch.action.search.SearchPhaseController.validateSameSortTypesAndMaybeRewrite(SearchPhaseController.java:219) ~[main/:?]
  1> 	at org.elasticsearch.action.search.SearchPhaseController.mergeTopDocs(SearchPhaseController.java:167) ~[main/:?]

The final AssertionError is coming from “node_s0” again, after the second failure, so I assume its that later error handling that might have gone wrong somehow:

1> [2025-07-21T17:37:04,840][ERROR][o.e.ExceptionsHelper     ][node_s0][transport_worker][T#2] fatal error java.lang.AssertionError: java.lang.IllegalStateException: Attempted to close with partial reduce in-flight
  1> 	at org.elasticsearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:449)
  1> 	at java.base/java.util.Optional.ifPresent(Optional.java:178)
  1> 	at org.elasticsearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:439)
  1> 	at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.exceptionCaught(Netty4MessageInboundHandler.java:63)
  1> 	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
  1> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:447)

cbuescher avatar Jul 25 '25 14:07 cbuescher

Another more recent one.

cbuescher avatar Jul 30 '25 17:07 cbuescher

This has been muted on branch 9.1

Mute Reasons:

  • [9.1] 2 failures in test testSortMixedFieldTypes (0.8% fail rate in 262 executions)

Build Scans:

elasticsearchmachine avatar Jul 31 '25 05:07 elasticsearchmachine

A similar failure was raised in my PR as well ->

java.lang.AssertionError: |  
-- | --
Expected: an empty collection |  
but: <[LEAK: resource was not cleaned up before it was garbage-collected. |  
Recent access records: |  
#1: |  
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |  
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:388) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |  
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |  
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |  
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |  
java.base/java.lang.Thread.run(Thread.java:1474) |  
#2: |  
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |  
org.elasticsearch.action.search.ArraySearchPhaseResults.consumeResult(ArraySearchPhaseResults.java:47) |  
org.elasticsearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:161) |  
org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:507) |  
org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:192) |  
org.elasticsearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:270) |  
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:34) |  
org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:19) |  
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:63) |  
org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:27) |  
org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49) |  
org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:592) |  
org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1726) |  
org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1504) |  
org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1602) |  
org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1576) |  
org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35) |  
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33) |  
org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20) |  
org.elasticsearch.action.ActionListenerImplementations$DelegatingResponseActionListener.onResponse(ActionListenerImplementations.java:198) |  
org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:386) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |  
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |  
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |  
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |  
java.base/java.lang.Thread.run(Thread.java:1474) |  
#3: |  
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |  
org.elasticsearch.core.Releasables.close(Releasables.java:39) |  
org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72) |  
org.elasticsearch.action.ActionListenerImplementations$2.run(ActionListenerImplementations.java:51) |  
org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:218) |  
org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:413) |  
org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:406) |  
org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:326) |  
org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:371) |  
org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:262) |  
org.elasticsearch.search.internal.SearchContext.close(SearchContext.java:115) |  
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:944) |  
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |  
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |  
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |  
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |  
java.base/java.lang.Thread.run(Thread.java:1474) |  
#4: |  
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |  
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:941) |  
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |  
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |  
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |  
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |  
java.base/java.lang.Thread.run(Thread.java:1474) |  
Created at: |  
in [elasticsearch[node_s1][search][T#3]][testSortMixedFieldTypes] |  
org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:110) |  
org.elasticsearch.search.DefaultSearchContext.addQueryResult(DefaultSearchContext.java:342) |  
org.elasticsearch.search.SearchService$ResultsType$2.addResultsObject(SearchService.java:1863) |  
org.elasticsearch.search.SearchService.createContext(SearchService.java:1373) |  
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:898) |  
org.elasticsearch.search.SearchService.lambda$executeQueryPhase$7(SearchService.java:739) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:79) |  
org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:76) |  
org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:101) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) |  
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067) |  
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) |  
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) |  
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) |  
java.base/java.lang.Thread.run(Thread.java:1474)]>

</div></div>

pmpailis avatar Sep 23 '25 10:09 pmpailis

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test testSortMixedFieldTypes (0.2% fail rate in 879 executions)

Build Scans:

elasticsearchmachine avatar Sep 24 '25 08:09 elasticsearchmachine

Revisiting, this is probably also related to the phase failures issues in batched execution. So we should try to get that in first and then see if the failures here are resolved then.

cbuescher avatar Oct 31 '25 11:10 cbuescher

After https://github.com/elastic/elasticsearch/pull/136889 has been merged I would like to unmute this test and see if it still fails occasionally, wdyt @benchaplin? Let me know on https://github.com/elastic/elasticsearch/pull/137724 if you think that we should try this.

cbuescher avatar Nov 07 '25 12:11 cbuescher

I'd expect this one to be solved by #137698.

My reasoning is the leak trace:

  • QuerySearchResult created (refs = 1)
  • explicit incRef in SearchService.executeQueryPhase (+1, refs = 2)
  • SearchContext.close (-1, refs = 1)
  • ArraySearchPhaseResults.consumeResult (+1, refs = 2)
  • ActionListener.respondAndRelease - once SearchService.executeQueryPhase is complete (-1, refs = 1)

What's missing is the corresponding release from ArraySearchPhaseResults. We get this from closing the QueryPhaseResultConsumer on the data node.

benchaplin avatar Nov 07 '25 14:11 benchaplin

^Scratch that, I'm a bit mixed up. I think you're right, this leak bug should be solved by https://github.com/elastic/elasticsearch/pull/136889.

benchaplin avatar Nov 07 '25 18:11 benchaplin

I unmuted the test, leaving this open just to check back in a couple of days if we see the errors again.

cbuescher avatar Nov 07 '25 18:11 cbuescher