mirai icon indicating copy to clipboard operation
mirai copied to clipboard

Java heap out of memory

Open milkice233 opened this issue 2 years ago • 7 comments

问题描述

可能与 #818 相关 之前就一直有内存溢出的情况,这次特地加上了-XX:+HeapDumpOnOutOfMemoryError并简单分析了一下dump,由于dump文件过大且可能有隐私信息就不随附件上传,先上传几个可能认为对解决问题有帮助的截图

  1. Overview 显示 530 个 net.mamoe.mirai.api.http.context.session.StandardSession 对象 retained heap 占据了约3.9G内存空间 image

  2. 0x707925810 内存空间 retained heap 占据 87M,以下是 list objects with outgoing references 截图(看起来像是 mirai http api 未读消息队列堆积了太多消息导致的?) image

  3. list objects with inbound references 截图 image

对Java profiling真不熟悉,如果需要更多信息和截图将一一补充

复现

使用mcl启动,登录QQ,登录若干天后即内存溢出

mirai-core 版本

2.11.0

bot-protocol

ANDROID_PAD

其他组件版本

[INFO] Package: net.mamoe:mirai-console Channel: stable Type: libs Version: 2.11.0 Locked: false [INFO] Package: net.mamoe:mirai-console-terminal Channel: stable Type: libs Version: 2.11.0 Locked: false [INFO] Package: net.mamoe:mirai-core-all Channel: stable Type: libs Version: 2.11.0 Locked: false [INFO] Package: org.itxtech:mcl-addon Channel: c2001 Type: plugins Version: 2.0.2 Locked: false [INFO] Package: net.mamoe:mirai-api-http Channel: stable-v2 Type: plugins Version: 2.5.2 Locked: false

系统日志

在正式崩溃前已频繁出现此类异常
E/Mah Debug: java.util.concurrent.CancellationException: ArrayChannel was cancelled
java.util.concurrent.CancellationException: ArrayChannel was cancelled
        at kotlinx.coroutines.channels.AbstractChannel.cancel(AbstractChannel.kt:656)
        at kotlinx.coroutines.channels.ReceiveChannel$DefaultImpls.cancel$default(Channel.kt:279)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl$runOutgoingProcessor$1.invokeSuspend(DefaultWebSocketSessionImpl.kt:182)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl$runOutgoingProcessor$1.invoke(DefaultWebSocketSessionImpl.kt)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl$runOutgoingProcessor$1.invoke(DefaultWebSocketSessionImpl.kt)
        at kotlinx.coroutines.intrinsics.UndispatchedKt.startCoroutineUndispatched(Undispatched.kt:55)
        at kotlinx.coroutines.CoroutineStart.invoke(CoroutineStart.kt:112)
        at kotlinx.coroutines.AbstractCoroutine.start(AbstractCoroutine.kt:126)
        at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch(Builders.common.kt:56)
        at kotlinx.coroutines.BuildersKt.launch(Unknown Source)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl.runOutgoingProcessor(DefaultWebSocketSessionImpl.kt:168)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl.start(DefaultWebSocketSessionImpl.kt:74)
        at io.ktor.websocket.RoutingKt.proceedWebSocket(Routing.kt:230)
        at io.ktor.websocket.RoutingKt.access$proceedWebSocket(Routing.kt:1)
        at io.ktor.websocket.RoutingKt$webSocket$2.invokeSuspend(Routing.kt:197)
        at io.ktor.websocket.RoutingKt$webSocket$2.invoke(Routing.kt)
        at io.ktor.websocket.RoutingKt$webSocket$2.invoke(Routing.kt)
        at io.ktor.websocket.RoutingKt$webSocketRaw$2$1$1$1$1.invokeSuspend(Routing.kt:104)
        at io.ktor.websocket.RoutingKt$webSocketRaw$2$1$1$1$1.invoke(Routing.kt)
        at io.ktor.websocket.RoutingKt$webSocketRaw$2$1$1$1$1.invoke(Routing.kt)
        at io.ktor.http.cio.websocket.RawWebSocketKt.start(RawWebSocket.kt:90)
        at io.ktor.websocket.WebSocketUpgrade$upgrade$2.invokeSuspend(WebSocketUpgrade.kt:98)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

E/Mah Debug: java.util.concurrent.CancellationException: ArrayChannel was cancelled
java.util.concurrent.CancellationException: ArrayChannel was cancelled
        at kotlinx.coroutines.channels.AbstractChannel.cancel(AbstractChannel.kt:656)
        at kotlinx.coroutines.channels.ReceiveChannel$DefaultImpls.cancel$default(Channel.kt:279)
        at io.ktor.http.cio.websocket.DefaultWebSocketSessionImpl$runOutgoingProcessor$1.invokeSuspend(DefaultWebSocketSessionImpl.kt:182)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.common.kt:69)
        at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:245)
        at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:161)
        at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397)
        at kotlinx.coroutines.CancellableContinuationImpl.cancel(CancellableContinuationImpl.kt:183)
        at kotlinx.coroutines.CancellableContinuationImpl.parentCancelled$kotlinx_coroutines_core(CancellableContinuationImpl.kt:190)
        at kotlinx.coroutines.ChildContinuation.invoke(JobSupport.kt:1474)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.tryMakeCancelling(JobSupport.kt:795)
        at kotlinx.coroutines.JobSupport.makeCancelling(JobSupport.kt:755)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:671)
        at kotlinx.coroutines.JobSupport.parentCancelled(JobSupport.kt:637)
        at kotlinx.coroutines.ChildHandleNode.invoke(JobSupport.kt:1465)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath(JobSupport.kt:900)
        at kotlinx.coroutines.JobSupport.tryMakeCompleting(JobSupport.kt:863)
        at kotlinx.coroutines.JobSupport.cancelMakeCompleting(JobSupport.kt:696)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:667)
        at kotlinx.coroutines.JobSupport.parentCancelled(JobSupport.kt:637)
        at kotlinx.coroutines.ChildHandleNode.invoke(JobSupport.kt:1465)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.makeCancelling(JobSupport.kt:747)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:671)
        at kotlinx.coroutines.JobSupport.parentCancelled(JobSupport.kt:637)
        at kotlinx.coroutines.ChildHandleNode.invoke(JobSupport.kt:1465)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.tryMakeCancelling(JobSupport.kt:795)
        at kotlinx.coroutines.JobSupport.makeCancelling(JobSupport.kt:755)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:671)
        at kotlinx.coroutines.JobSupport.parentCancelled(JobSupport.kt:637)
        at kotlinx.coroutines.ChildHandleNode.invoke(JobSupport.kt:1465)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.tryMakeCancelling(JobSupport.kt:795)
        at kotlinx.coroutines.JobSupport.makeCancelling(JobSupport.kt:755)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:671)
        at kotlinx.coroutines.JobSupport.parentCancelled(JobSupport.kt:637)
        at kotlinx.coroutines.ChildHandleNode.invoke(JobSupport.kt:1465)
        at kotlinx.coroutines.JobSupport.notifyCancelling(JobSupport.kt:1499)
        at kotlinx.coroutines.JobSupport.tryMakeCancelling(JobSupport.kt:795)
        at kotlinx.coroutines.JobSupport.makeCancelling(JobSupport.kt:755)
        at kotlinx.coroutines.JobSupport.cancelImpl$kotlinx_coroutines_core(JobSupport.kt:671)
        at kotlinx.coroutines.JobSupport.cancelInternal(JobSupport.kt:632)
        at kotlinx.coroutines.JobSupport.cancel(JobSupport.kt:617)
        at kotlinx.coroutines.JobKt__JobKt.cancel(Job.kt:549)
        at kotlinx.coroutines.JobKt.cancel(Unknown Source)
        at kotlinx.coroutines.JobKt__JobKt.cancel$default(Job.kt:548)
        at kotlinx.coroutines.JobKt.cancel$default(Unknown Source)
        at io.ktor.server.cio.backend.ServerPipelineKt$startServerConnectionPipeline$1.invokeSuspend(ServerPipeline.kt:179)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)


正式崩溃时
2022-08-04 10:40:36 W/stderr: Exception in thread "DefaultDispatcher-worker-1" Exception in thread "nioEventLoopGroup-39-4" java.lang.OutOfMemoryError: Java heap space
2022-08-04 10:40:36 E/main: Exception in coroutine <unnamed>
java.lang.OutOfMemoryError: Java heap space

2022-08-04 10:40:46 W/stderr: Exception in thread "nioEventLoopGroup-39-2" java.lang.OutOfMemoryError: Java heap space
2022-08-04 10:41:59 W/stderr:   at java.base/java.util.concurrent.atomic.AtomicReferenceArray.<init>(AtomicReferenceArray.java:67)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.LockFreeTaskQueueCore.<init>(LockFreeTaskQueue.kt:83)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.LockFreeTaskQueueCore.allocateNextCopy(LockFreeTaskQueue.kt:230)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.LockFreeTaskQueueCore.allocateOrGetNextCopy(LockFreeTaskQueue.kt:225)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.LockFreeTaskQueueCore.next(LockFreeTaskQueue.kt:214)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.LockFreeTaskQueue.addLast(LockFreeTaskQueue.kt:51)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.scheduling.CoroutineScheduler.addToGlobalQueue(CoroutineScheduler.kt:121)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.scheduling.CoroutineScheduler.dispatch(CoroutineScheduler.kt:389)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.scheduling.CoroutineScheduler.dispatch$default(CoroutineScheduler.kt:382)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.scheduling.SchedulerCoroutineDispatcher.dispatch(Dispatcher.kt:97)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith(DispatchedContinuation.kt:322)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable$default(Cancellable.kt:25)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.CoroutineStart.invoke(CoroutineStart.kt:110)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.AbstractCoroutine.start(AbstractCoroutine.kt:126)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch(Builders.common.kt:56)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.BuildersKt.launch(Unknown Source)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
2022-08-04 10:41:59 W/stderr:   at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.MahContext.handleBotEvent(MahContext.kt:112)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.session.manager.DefaultSessionManager$authSession$1.invoke(default.kt:43)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.session.manager.DefaultSessionManager$authSession$1.invoke(default.kt:43)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.session.ListenableSessionWrapper$startBotEventListener$element$1.invokeSuspend(session.kt:130)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.session.ListenableSessionWrapper$startBotEventListener$element$1.invoke(session.kt)
2022-08-04 10:41:59 W/stderr:   at mirai-api-http-2.5.2.jar//net.mamoe.mirai.api.http.context.session.ListenableSessionWrapper$startBotEventListener$element$1.invoke(session.kt)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$subscribeAlways$1.invokeSuspend(EventChannel.kt:455)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$subscribeAlways$1.invoke(EventChannel.kt)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$subscribeAlways$1.invoke(EventChannel.kt)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$subscribeAlways$1.invoke(EventChannel.kt)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$filter$1$intercepted$thisIntercepted$1.invokeSuspend(EventChannel.kt:169)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$filter$1$intercepted$thisIntercepted$1.invoke(EventChannel.kt)
2022-08-04 10:41:59 W/stderr:   at net.mamoe.mirai.event.EventChannel$filter$1$intercepted$thisIntercepted$1.invoke(EventChannel.kt)

网络日志

No response

补充信息

No response

milkice233 avatar Aug 06 '22 14:08 milkice233

@ryoii

Him188 avatar Aug 06 '22 15:08 Him188

轮询模式未读消息不消费当然会积压

ryoii avatar Aug 07 '22 16:08 ryoii

轮询模式未读消息不消费当然会积压

应该设置一个最大队列长度吧

wyapx avatar Aug 07 '22 16:08 wyapx

轮询模式未读消息不消费当然会积压

目前一些基于MAH的框架实现是同时利用Websocket或者Webhook adaptor进行消息推送的接收并用Http adaptor进行消息发送和其它请求,然而现在的设计是消息缓存和http未读队列缓存共用一个cache size。请问能否增加一个选项让两者分开设置或者关闭http的未读队列缓存。

Numendacil avatar Aug 08 '22 05:08 Numendacil

轮询模式未读消息不消费当然会积压

目前一些基于MAH的框架实现是同时利用Websocket或者Webhook adaptor进行消息推送的接收并用Http adaptor进行消息发送和其它请求,然而现在的设计是消息缓存和http未读队列缓存共用一个cache size。请问能否增加一个选项让两者分开设置或者关闭http的未读队列缓存。

不知道这些框架是否是 http 和 ws 是复用同一个 session,可以考虑在不通过 http 绑定的情况下使用 ws 生成的 session,而未 http 绑定的 session 不生成未读队列

ryoii avatar Aug 08 '22 05:08 ryoii

发现问题了,我之前用的一个 Python SDK 延续旧版 MAH 的验证逻辑,会走 http adapter 要一次 sessionKey,但是之后所有的消息又都是走 websocket 通信的,按照现有逻辑设计必然会导致内存泄漏

不过感觉确实应该设计一个队列定时丢弃机制

milkice233 avatar Aug 08 '22 07:08 milkice233

发现问题了,我之前用的一个 Python SDK 延续旧版 MAH 的验证逻辑,会走 http adapter 要一次 sessionKey,但是之后所有的消息又都是走 websocket 通信的,按照现有逻辑设计必然会导致内存泄漏

不过感觉确实应该设计一个队列定时丢弃机制

两边都处理下吧,如果不需要轮询的话,最优的方式是用 ws 获取 session,然后使用同一个 session 发起 http 这边也默认舍弃未读消息,等新版发布

ryoii avatar Aug 08 '22 07:08 ryoii

此问题属于 https://github.com/project-mirai/mirai-api-http,后续问题请在 MAH 提交

Him188 avatar Aug 26 '22 05:08 Him188