OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Node drops out with OutOfMemoryError for reasons other than the VM having run out of memory

Open Bukhtawar opened this issue 3 years ago • 5 comments

Describe the bug Node can drop out for OutOfMemory errors other than the VM OutOfMemory like

java.lang.OutOfMemoryError: UTF16 String size is 1089861360, should be less than 1073741823

This is coming from the implementation limits of StringUTF16.

    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:952) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:977) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:216) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.transport.TcpTransport$1.doRun(TcpTransport.java:985) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1104) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:454) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:29) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:45) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.FetchSearchPhase$2.innerOnResponse(FetchSearchPhase.java:163) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.FetchSearchPhase$2.innerOnResponse(FetchSearchPhase.java:166) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.CountedCollector.onResult(CountedCollector.java:64) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.CountedCollector.countDown(CountedCollector.java:53) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$2(FetchSearchPhase.java:104) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:206) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:153) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:160) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:120) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:153) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:160) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.FetchSearchPhase$3.run(FetchSearchPhase.java:213) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onResponse(AbstractSearchAsyncAction.java:50) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onResponse(AbstractSearchAsyncAction.java:311) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:497) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.common.text.Text.toString(Text.java:94) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.common.text.Text.string(Text.java:89) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.elasticsearch.common.bytes.BytesReference.utf8ToString(BytesReference.java:98) ~[elasticsearch-6.8.0.jar:6.8.0]
    at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:138) ~[lucene-core-7.7.0.jar:7.7.0-SNAPSHOT bb8faecf0a738cf5294e398973014b0090e9dc51 - akjain - 2020-02-14 16:08:48]
    at java.lang.String.<init>(String.java:276) ~[?:?]
    at java.lang.String.<init>(String.java:3222) ~[?:?]
    at java.lang.StringUTF16.toBytes(StringUTF16.java:151) ~[?:?]
    at java.lang.StringUTF16.newBytesFor(StringUTF16.java:49) ~[?:?]
Caused by: java.lang.OutOfMemoryError: UTF16 String size is 1089861360, should be less than 1073741823
    at java.lang.Thread.run(Thread.java:834) [?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:856) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:66) [transport-netty4-client-6.8.0.jar:6.8.0]
java.lang.Exception: java.lang.OutOfMemoryError: UTF16 String size is 1089861360, should be less than 1073741823

https://bugs.openjdk.java.net/browse/JDK-8230744

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior Nodes should not drop off the cluster due to a single request/response that doesn't affect VM directly

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

Bukhtawar avatar Dec 03 '21 05:12 Bukhtawar

@Bukhtawar : Is there a way to replicate this issue ? That will be helpful to debug, root-cause and finally verify the fix.

dreamer-89 avatar Jan 18 '22 19:01 dreamer-89

I think the JDK issue mentioned in the description is incorrect. https://bugs.openjdk.org/browse/JDK-8190429

anandpatel9998 avatar Jul 06 '22 23:07 anandpatel9998

Compact strings are enabled by default in JDK9. I haven't tried it but it seems that behavior can be disabled by specifying +XX:-CompactStrings in jvm.options

consulthys avatar Sep 13 '22 12:09 consulthys

Since the issue (aka OOM & node drop) occur due to increased response size which String is not able to hold, is there an option to introduce graceful failure of such search requests that lead to extra large response sizes (and avoid system instability i.e. node drops)?

manojfaria avatar Sep 14 '22 16:09 manojfaria

There are bunch of ideas and some projects along those lines, @Bukhtawar do you know a specific one that would address this exact issue?

dblock avatar Sep 14 '22 21:09 dblock