elasticsearch-prometheus-exporter
elasticsearch-prometheus-exporter copied to clipboard
NullPointerException on /_prometheus/metrics with Elasticsearch 7.8.0
Hi, I am using Elasticsearch 7.8.0 and elasticsearch-prometheus plugin 7.8.0.0. I can see many warning logs in the Elasticsearch w.r.t NullPointerException. Below are the logs.
`"log":"path: /_prometheus/metrics, params: {}"
java.lang.NullPointerException: null
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexContextMetrics(PrometheusMetricsCollector.java:468) ~[?:?]
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexMetrics(PrometheusMetricsCollector.java:461) ~[?:?]
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updateMetrics(PrometheusMetricsCollector.java:947) ~[?:?]
at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:97) ~[?:?]
at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:80) ~[?:?]
at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.gatherRequests(TransportNodePrometheusMetricsAction.java:127) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.access$400(TransportNodePrometheusMetricsAction.java:76) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:136) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:132) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:633) [elasticsearch-7.8.0.jar:7.8.0]
at com.floragunn.searchguard.transport.SearchGuardInterceptor$RestoringTransportResponseHandler.handleResponse(SearchGuardInterceptor.java:259) [search-guard-suite-security-7.8.0-43.0.0-144.jar:7.8.0-43.0.0-144]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:205) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:117) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73) [transport-netty4-client-7.8.0.jar:7.8.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]`
How large and busy is your cluster? Can this be because internal metric requests timeout? Do you see any timeout related logs in ES log?
Hi, The cluster is not too busy as I am running some automated test cases in pipeline. I don't have any other logs apart from shared above.
Hi, Is there any update?
FWIW I also get it on Elasticsearch 7.5.2:
[2021-05-11T15:21:48,063][WARN ][r.suppressed ] [ivory4.X] path: /_prometheus/metrics, params: {}
java.lang.NullPointerException: null
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexContextMetrics(PrometheusMetricsCollector.java:468) ~[?:?]
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexMetrics(PrometheusMetricsCollector.java:461) ~[?:?]
at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updateMetrics(PrometheusMetricsCollector.java:947) ~[?:?]
at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:87) ~[?:?]
at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:70) ~[?:?]
at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.5.2.jar:7.5.2]
at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.5.2.jar:7.5.2]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:70) [elasticsearch-7.5.2.jar:7.5.2]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.5.2.jar:7.5.2]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.gatherRequests(TransportNodePrometheusMetricsAction.java:127) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.access$400(TransportNodePrometheusMetricsAction.java:76) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:136) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:132) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
...
I've always been confused by elasticsearch prometheus exporters and grafana dashboards, I guess there is no official one? This one also seems popular, I wish I knew if it were worth switching: https://github.com/justwatchcom/elasticsearch_exporter
@jamshid Did you change the prometheus.indices settings in any way? Right now it is true by default, have you changed that to false?
Can you share output of:
curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=defaults.prometheus
curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=transient.prometheus
curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=persistent.prometheus
Sure, thanks:
curl 'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=defaults.prometheus' \
'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=transient.prometheus' \
'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=persistent.prometheus'
{"defaults":{"prometheus":{"cluster":{"settings":"true"},"indices":"true"}}}
{}
{}
FWIW it doesn't happen all the time, sometimes 5 times in a half hour, sometimes twice within a minute.
@jamshid @chaitrahegde115 What scrape_interval do you use? Can increasing scrape_interval help?
@lukas-vlcek Any update? Seems like we all have default settings and from average to high load on elasticsearch.
I am also facing the same issue, I was going through the code (https://github.com/vvanholl/elasticsearch-prometheus-exporter/blob/7.8.0.0/src/main/java/org/compuscene/metrics/prometheus/PrometheusMetricsCollector.java) there I found that idx.getDocs() can return null (based on the code observation), so getCount() on that can cause nullpointerexception. Please help me how can I fix this issue. FYI this issue is coming intermittently.

@Mohitj252 Interesting. Quick workaround would be to wrap the entire line (no. 468) with something like the following:
try { ... } catch (Exception e) { log.warn("Skipping index_doc_number metric", e); }
However, that would result in missing metric in Prometheus sample in case of exception.
Better would be checking how call to getCount() is handled in Elasticsearch AL2 code.
Hi @lukas-vlcek , adding try catch will not solve our problem, better we can focus on handling condition for getDocs, if getDocs is null then we create new object (Just an idea) and pass all the required info to the new created object. what is your opinion on this approach ? 2. If you think, this is an issue in the elastic search then can you please fix the issue in the upcoming elastic version. Thanks in advance.
Oh sorry, I was misreading what you said. In that case I think it makes sense to do something like
if (idx.getDocs() != null) {
//handle all metrics that require getDocs()
}
Why the getDocs() can be null is another topic. I can not tell from the top of my head if it can be "fixed" directly in Elasticsearch without knowing the reason behind.
Hi @lukas-vlcek , can you please check and discuss the issue with elastic team to understand why the issue is coming intermittently and how can I overcome from this null pointer exception.