thingsboard icon indicating copy to clipboard operation
thingsboard copied to clipboard

[Question] Server crash caused by out of memory (org.thingsboard.server.actors.TbActorMailbox)

Open maksonlee opened this issue 3 years ago • 5 comments

Component

  • Generic

Description Server crash when doing a performance test [1], see attached file for more details, why this happens?

[1] https://github.com/thingsboard/performance-tests

Environment

  • OS: Ubuntu 20.04 Server
  • ThingsBoard: 3.4.1

image

maksonlee avatar Sep 16 '22 14:09 maksonlee

Hi, add to your thingsboard.conf file this Java options for optimization and stable work: export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G" This will allow to GarbageCollector in JVM to work stable and with less CPU usage, and will resolve OOM errors with ThingsBoard.

ban2derlog avatar Sep 16 '22 17:09 ban2derlog

I have export JAVA_OPTS="$JAVA_OPTS -Xms6G -Xmx6G" there, should i change it to export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G"?

maksonlee avatar Sep 16 '22 17:09 maksonlee

@maksonlee how much total memory do you have on this instance? If 6GB is total memory on instance then you need to apply for ThingsBoard a less memory. And is performance test are launched on same instance where ThingsBoard service is located? Also, please, check logs in /var/log/thingboard/thingsboard.log and /var/log/syslog

ban2derlog avatar Sep 16 '22 17:09 ban2derlog

  1. We have 16GB memory on this instance.
  2. The performance test is launched on another machine.
  3. /var/log/thingboard/thingsboard.log
java.lang.IllegalStateException: Deque full
        at java.base/java.util.concurrent.LinkedBlockingDeque.addLast(LinkedBlockingDeque.java:326)
        at java.base/java.util.concurrent.LinkedBlockingDeque.add(LinkedBlockingDeque.java:624)
        at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.submit(AbstractBufferedRateExecutor.java:146)
        at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsync(CassandraAbstractDao.java:99)
        at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsyncWrite(CassandraAbstractDao.java:80)
        at org.thingsboard.server.dao.timeseries.CassandraBaseTimeseriesDao.save(CassandraBaseTimeseriesDao.java:179)
        at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.doSaveAndRegisterFuturesFor(BaseTimeseriesService.java:199)
        at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.saveAndRegisterFutures(BaseTimeseriesService.java:186)
        at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.doSave(BaseTimeseriesService.java:165)
        at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.save(BaseTimeseriesService.java:149)
        at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotifyInternal(DefaultTelemetrySubscriptionService.java:171)
        at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.doSaveAndNotify(DefaultTelemetrySubscriptionService.java:138)
        at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotify(DefaultTelemetrySubscriptionService.java:125)
        at org.thingsboard.rule.engine.telemetry.TbMsgTimeseriesNode.onMsg(TbMsgTimeseriesNode.java:109)
        at org.thingsboard.server.actors.ruleChain.RuleNodeActorMessageProcessor.onRuleChainToRuleNodeMsg(RuleNodeActorMessageProcessor.java:135)
        at org.thingsboard.server.actors.ruleChain.RuleNodeActor.onRuleChainToRuleNodeMsg(RuleNodeActor.java:102)
        at org.thingsboard.server.actors.ruleChain.RuleNodeActor.doProcess(RuleNodeActor.java:61)
        at org.thingsboard.server.actors.service.ContextAwareActor.process(ContextAwareActor.java:45)
        at org.thingsboard.server.actors.TbActorMailbox.processMailbox(TbActorMailbox.java:142)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
2022-09-17 01:57:29,786 [cassandra-callback-15-thread-702] ERROR c.g.c.u.concurrent.AggregateFuture - An additional input failed after the first. Logging it after adding the first failure as a suppressed exception.
java.util.concurrent.TimeoutException: null
        at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.dispatch(AbstractBufferedRateExecutor.java:224)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

maksonlee avatar Sep 16 '22 20:09 maksonlee

Seems that org.thingsboard.server.actors.TbActorMailbox object didn't get released?

maksonlee avatar Sep 16 '22 23:09 maksonlee

Hi, do you use during performance test Rule Chain that send Mail in it's logic? I assume that here can be issue with Mail server, so if you generate a big amount of Mails, when perfrorming test, Mail server send timeouts. Check RuleChain and if you send Mails in logic - add DebugMode to Mail Node and see Events - Debug tab for Errors.

Also, do you use ThingsBoard with In-Memory queue? Which exactly load do you send to ThingsBoard by Performance test?

ban2derlog avatar Oct 17 '22 14:10 ban2derlog

Hi, do you use during performance test Rule Chain that send Mail in it's logic?

No, we use the default rule chain.

I assume that here can be issue with Mail server, so if you generate a big amount of Mails, when perfrorming test, Mail server send timeouts.

Even though, we expect that the server can still work fine.

Check RuleChain and if you send Mails in logic - add DebugMode to Mail Node and see Events - Debug tab for Errors.

Also, do you use ThingsBoard with In-Memory queue?

We are using Kafka.

Which exactly load do you send to ThingsBoard by Performance test?

MESSAGES_PER_SECOND=3000 DURATION_IN_SECONDS=259200

maksonlee avatar Oct 17 '22 14:10 maksonlee

I am having the same response when analysing a .hprof file, did this:

export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G"

work? Cheers, Max

Maxwh21 avatar Jan 03 '23 11:01 Maxwh21

had you resolve this? can you share which JDK version is used for ThingsBoard?

ban2derlog avatar Jan 13 '23 14:01 ban2derlog

Doesn't see this issue in version 3.4.4.

maksonlee avatar Apr 11 '23 14:04 maksonlee