hertzbeat
hertzbeat copied to clipboard
[BUG] java.lang.OutOfMemoryError: GC overhead limit exceeded
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
Expected Behavior
正常运行
Steps To Reproduce
系统正常运行,过段时间后,系统崩溃,控制台无法登录,查看日志,发现异常 java.lang.OutOfMemoryError: GC overhead limit exceeded
Environment
HertzBeat version(s):1.4.4
Debug logs
Anything else?
No response
hi, thanks for feedback. There maybe has a heap dump file in logs directory, can you find and provide it.
hi, are you add nginx monitoring? What the monitors you add? see https://github.com/dromara/hertzbeat/pull/1476. You can upgrade the hertzbeat version 1.5.0 to try again.
- no, we do not have nginx monitoring.
- the heap dump file is 5.2GB and tar file is 1.2GB, is there some method i can use to provide this file ?
- i will try to upgrade to 1.5.0 and see if there is still some problem.
below snapshot shows monitors we have added to hertzbeat
the heap dump file is 5.2GB and tar file is 1.2GB, is there some method i can use to provide this file ?
hi, you can use the https://cowtransfer.com/ to provide if possibale.
make sure the hertzbeat-collector version and hertzbeat version is the same.
2024-04-04 03:43:56 [netty-server-worker-3] INFO org.dromara.hertzbeat.manager.scheduler.netty.process.HeartbeatProcessor - the collector xxxxx-collector is not online.
2024-04-04 03:43:08 [netty-server-worker-0] ERROR org.dromara.hertzbeat.common.util.ProtoJsonUtil - Failed parsing JSON source: JsonReader at line 15 column 11 path $.fields[2].name to Json
com.google.protobuf.InvalidProtocolBufferException: Failed parsing JSON source: JsonReader at line 15 column 11 path $.fields[2].name to Json
at com.google.protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1345)
at com.google.protobuf.util.JsonFormat$Parser.merge(JsonFormat.java:477)
at org.dromara.hertzbeat.common.util.ProtoJsonUtil.toProtobuf(ProtoJsonUtil.java:56)
at org.dromara.hertzbeat.manager.scheduler.netty.process.CollectCyclicDataResponseProcessor.handle(CollectCyclicDataResponseProcessor.java:20)
at org.dromara.hertzbeat.remoting.netty.NettyRemotingAbstract.processRequestMsg(NettyRemotingAbstract.java:73)
at org.dromara.hertzbeat.remoting.netty.NettyRemotingAbstract.processReceiveMsg(NettyRemotingAbstract.java:59)
at org.dromara.hertzbeat.remoting.netty.NettyRemotingServer$NettyServerHandler.channelRead0(NettyRemotingServer.java:192)
at org.dromara.hertzbeat.remoting.netty.NettyRemotingServer$NettyServerHandler.channelRead0(NettyRemotingServer.java:182)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:336)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:336)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:308)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.google.gson.JsonParseException: Failed parsing JSON source: JsonReader at line 15 column 11 path $.fields[2].name to Json
at com.google.gson.JsonParser.parseReader(JsonParser.java:89)
at com.google.protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1340)
... 41 common frames omitted
- yes, the versions are mismatch, we have collectors v1.5.0 and hertzbeat v1.4.4 , yesterday i have upgraded hertzbeat to v1.5.0
- i have uploaded dump file here, https://cowtransfer.com/s/1d8595cd6b0b44 点击链接查看 [ java_pid10.hprof.tar.gz ] ,或访问奶牛快传 cowtransfer.com 输入传输口令 cx3cwz 查看;
after upgrade hertzbeat to v1.5.0, the error occurs again.
https://cowtransfer.com/s/5de6df61a93648 点击链接查看 [ java_pid10_0408.hprof.tar.gz ] ,或访问奶牛快传 cowtransfer.com 输入传输口令 p88c52 查看;
Got it!
hi how many monitor and cluster collector you add. We see lots of metrics data in memory.
Maybe you can use extern kafka queue instead of default inmemory queue in application.yml
common:
queue:
# memory or kafka
type: memory
# properties when queue type is kafka
kafka:
servers: 127.0.0.1:9092
metrics-data-topic: async-metrics-data
alerts-data-topic: async-alerts-data
we have one hertzbeat master node and nearly 20 edge collectors.
there is one edge collector named "az-10", i use this collector to take over all tasks used to be run by the master node, after two days it is running well and no error occurs.
next week i will try to use kafka queue insead of inmemory queue and see if there is still any problem.
单独启动了一个edge节点,将原来运行于master节点上的探测任务全部迁移至edge节点,msater节点只承担报警任务,稳定运行大约12天,又出现了 GC overhead limit exceeded 异常,现已将master节点配置了kafka队列,系统已运行2天,暂时正常,我们会持续关注master节点的运行状态。
将master节点配置kafka队列后,已稳定运行半个月,目前系统平稳,未出现其他异常。