heroic
heroic copied to clipboard
Heroic appears to mis-handle `EsRejectedExecutionException` errors from ElasticSearch in batched queries
I'm seeing an issue in production where a EsRejectedExecutionException
result from ElasticSearch will cause Heroic to return a correct-looking response but with empty results and empty errors. The responses look like this:
"redactedkey:1": {
"queryId": "21ee7a67-1023-495b-b2dd-55609d914f7a",
"range": {
"start": 1593316800000,
"end": 1595736600000
},
"trace": {
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query"
},
"elapsed": 537117,
"children": [
{
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query_shard[{site=us-east1}]"
},
"elapsed": 537157,
"children": [
{
"what": {
"name": "grpc://heroic-2.heroic.default.svc.cluster.local.:9698#query"
},
"elapsed": 537078,
"children": [
{
"what": {
"name": "com.spotify.heroic.metric.LocalMetricManager#query"
},
"elapsed": 327049,
"children": []
}
]
}
]
}
]
},
"limits": [],
"cached": false,
"cache": null,
"commonTags": {},
"commonResource": {},
"result": [],
"preAggregationSampleSize": 0,
"errors": []
},
And in our heroic logs:
Failed to execute phase [fetch],
21:47:18.350 [elasticsearch[_client_][listener][T#3]] ERROR com.spotify.heroic.rpc.grpc.GrpcRpcProtocolServer - 29c3cfe3-ca6c-4c16-bc16-b755c8867848: Request failed
... 48 more
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:94)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50)
Caused by: EsRejectedExecutionException[rejected execution of org.elasticsearch.action.search.FetchSearchPhase$1@4606797a on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@433876a3[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 3259109]]]
at java.lang.Thread.run(Thread.java:748)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
Heroic should pass this response along to the client in the errors
array.
Thanks for reporting this! Just to confirm, are you running version 2.2.0 or an earlier one?
We are running 0.10.5
at the moment.