[Question] Server crash caused by out of memory (org.thingsboard.server.actors.TbActorMailbox)
Component
- Generic
Description Server crash when doing a performance test [1], see attached file for more details, why this happens?
[1] https://github.com/thingsboard/performance-tests
Environment
- OS: Ubuntu 20.04 Server
- ThingsBoard: 3.4.1

Hi,
add to your thingsboard.conf file this Java options for optimization and stable work:
export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G"
This will allow to GarbageCollector in JVM to work stable and with less CPU usage, and will resolve OOM errors with ThingsBoard.
I have export JAVA_OPTS="$JAVA_OPTS -Xms6G -Xmx6G" there, should i change it to export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G"?
@maksonlee how much total memory do you have on this instance?
If 6GB is total memory on instance then you need to apply for ThingsBoard a less memory.
And is performance test are launched on same instance where ThingsBoard service is located?
Also, please, check logs in /var/log/thingboard/thingsboard.log and /var/log/syslog
- We have 16GB memory on this instance.
- The performance test is launched on another machine.
-
/var/log/thingboard/thingsboard.log
java.lang.IllegalStateException: Deque full
at java.base/java.util.concurrent.LinkedBlockingDeque.addLast(LinkedBlockingDeque.java:326)
at java.base/java.util.concurrent.LinkedBlockingDeque.add(LinkedBlockingDeque.java:624)
at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.submit(AbstractBufferedRateExecutor.java:146)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsync(CassandraAbstractDao.java:99)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsyncWrite(CassandraAbstractDao.java:80)
at org.thingsboard.server.dao.timeseries.CassandraBaseTimeseriesDao.save(CassandraBaseTimeseriesDao.java:179)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.doSaveAndRegisterFuturesFor(BaseTimeseriesService.java:199)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.saveAndRegisterFutures(BaseTimeseriesService.java:186)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.doSave(BaseTimeseriesService.java:165)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.save(BaseTimeseriesService.java:149)
at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotifyInternal(DefaultTelemetrySubscriptionService.java:171)
at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.doSaveAndNotify(DefaultTelemetrySubscriptionService.java:138)
at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotify(DefaultTelemetrySubscriptionService.java:125)
at org.thingsboard.rule.engine.telemetry.TbMsgTimeseriesNode.onMsg(TbMsgTimeseriesNode.java:109)
at org.thingsboard.server.actors.ruleChain.RuleNodeActorMessageProcessor.onRuleChainToRuleNodeMsg(RuleNodeActorMessageProcessor.java:135)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.onRuleChainToRuleNodeMsg(RuleNodeActor.java:102)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.doProcess(RuleNodeActor.java:61)
at org.thingsboard.server.actors.service.ContextAwareActor.process(ContextAwareActor.java:45)
at org.thingsboard.server.actors.TbActorMailbox.processMailbox(TbActorMailbox.java:142)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
2022-09-17 01:57:29,786 [cassandra-callback-15-thread-702] ERROR c.g.c.u.concurrent.AggregateFuture - An additional input failed after the first. Logging it after adding the first failure as a suppressed exception.
java.util.concurrent.TimeoutException: null
at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.dispatch(AbstractBufferedRateExecutor.java:224)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Seems that org.thingsboard.server.actors.TbActorMailbox object didn't get released?
Hi, do you use during performance test Rule Chain that send Mail in it's logic? I assume that here can be issue with Mail server, so if you generate a big amount of Mails, when perfrorming test, Mail server send timeouts. Check RuleChain and if you send Mails in logic - add DebugMode to Mail Node and see Events - Debug tab for Errors.
Also, do you use ThingsBoard with In-Memory queue? Which exactly load do you send to ThingsBoard by Performance test?
Hi, do you use during performance test Rule Chain that send Mail in it's logic?
No, we use the default rule chain.
I assume that here can be issue with Mail server, so if you generate a big amount of Mails, when perfrorming test, Mail server send timeouts.
Even though, we expect that the server can still work fine.
Check RuleChain and if you send Mails in logic - add DebugMode to Mail Node and see Events - Debug tab for Errors.
Also, do you use ThingsBoard with In-Memory queue?
We are using Kafka.
Which exactly load do you send to ThingsBoard by Performance test?
MESSAGES_PER_SECOND=3000 DURATION_IN_SECONDS=259200
I am having the same response when analysing a .hprof file, did this:
export JAVA_OPTS="$JAVA_OPTS -Xms4G -Xm4G"
work? Cheers, Max
had you resolve this? can you share which JDK version is used for ThingsBoard?
Doesn't see this issue in version 3.4.4.