nacos
nacos copied to clipboard
Nacos Memory Leak (http connection pool leak).
Hi guys,
We have a nacos cluster (V2.0.3) with 3 nodes, one of them crashed with out of memory error, below is the screenshot of the heapdump analyzed with MAT.
here is the information:
-
jvm max heap size: 4G
-
as in the screenshot, PoolingNHttpClientConnectionManager -> CPool -> LinkedList with 643199 entries occupied 3.37G(86%) memory, which looks like the memory leak.
-
most of the item in the linked list had the object type "LeaseRequest"
please help us address this issue, reply the post if you need more info, thanks.
You may be using a 1.x client with an HTTP polling connection configuration. Configuration more when very memory consumption related to you can understand the source code
You are advised to upgrade to 2.x
可能你是用的1.X版本的客户端用的是http轮询连接配置。配置多的时候十分消耗内存相关你可了解源码 建议升级成为2.X
我这里确实是1.x的客户端连2.x的服务端,不过我提的问题是nacos server的oom,
不太了解这其中的逻辑关系,能解释一下吗
另外,我们只把nacos作为dubbo的注册中心,注册的服务也不多,也就三四千个。
也许是心跳检测过于频繁 解决方案如下: 1增加nacos每个节点内存 2.升级client端为2.X 使用grpc节约网络消耗操作系统fd 3.客户端设置心跳时间例如: dubbo(可以自己查询资料) cloud: nacos: discovery: heart-beat-interval: 10 (默认5s 可设置10秒或者更多)
我们的nacos服务器又发生了同样的问题,请问有人能帮忙分析一下原因吗? 很可能是这里有连接池泄露,有熟悉这块代码的人吗?
@elvislou hello,首先从图上和你描述的是"nacos server oom" 可以初步判断是nacos server中异步客户端的导致的连接泄漏。你可以用MAT的 "with incoming references"找到发生连接泄漏的NacosAsyncRestTemplate实例被谁引用,可以根据找到的引用的全限定名初步判断是什么逻辑导致的连接泄漏(AsyncNotifyService、HealthCheck、ServerMamberManager等),缩小排查范围
我们有同样的问题,看 #7375 我们加了获取连接超时解决了OOM问题,内存耗尽不是连接泄露,而是堆积起来的获取连接请求,这些请求没有设置超时,所以会一直等。
leasingRequest,deadline都是LONG.MAX_VALUE,不会被自动清理,只进不出把内存撑爆了
设置获取连接超时只是避免OOM的问题,我们最终还是修改了HttpClientManager的ApacheSyncHttpClientFactory的buildHttpClientConfig,把线程池调大才解决了大规模临时注册实例场景下的注册数量抖动问题,原因就是连接数不够用,导致节点之间转发临时实例心跳不及时。
Does anyone know how to reproduce this issue
Thanks for your feedback and contribution. But the issue/pull request has not had recent activity more than 180 days. This issue/pull request will be closed if no further activity occurs 7 days later. We may solve this issue in new version. So can you upgrade to newest version and retry? If there are still issues or want to contribute again. Please create new issue or pull request again.