hapi icon indicating copy to clipboard operation
hapi copied to clipboard

Out of memory disconnected.

Open PedroSFreitas opened this issue 7 years ago • 4 comments

We are currently facing some kind of memory leaking running smart_module.py. And I believe it should be best open an issue and let others send their suggestions. Any feedback should be great!

Here is a small log from the last run:

2017-05-14 08:19:09.598386 - alert.log - INFO - Fetching alert param. from database
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
2017-05-14 08:19:11.291427 - smartmodule.log - INFO - Wrote to analytic database: [{'fields': {'unit': 'C', 'value': '22.25'}, 'tags': {'site': u'HPF-0', 'asset': 'Indoor Temperature'}, 'time': '2017-05-14 08:19:10.437610', 'measurement': 'Environment'}].
STATUS/QUERY I might need to know how you are!
ASSET/QUERY/1234567890987654 Is it warm here?
Device file: /sys/bus/w1/devices/28-03168af288ff/w1_slave
STATUS/RESPONSE [{'memory': {'cached': 125579264, 'used': 49889280, 'free': 235081728}, 'disk': {'total': 7622344704L, 'free': 2562277376L, 'used': 4689182720L}, 'network': {'packet_recv': 685138, 'packet_sent': 616943}, 'time': 1494749952.018644, 'hostname': 'RTU278768', 'boot': '2017-05-11 11:22:57', 'cpu': {'percentage': 1.4}, 'clients': 1}]
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
[Errno 32] Broken pipe
2017-05-14 08:21:25.400334 - communicator.log - INFO - Disconnected: Out of memory..
2017-05-14 08:21:26.412194 - communicator.log - INFO - Connected with result code 0
$SYS/broker/clients/total 0
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
STATUS/QUERY I might need to know how you are!
ASSET/QUERY/1234567890987654 Is it warm here?
Device file: /sys/bus/w1/devices/28-03168af288ff/w1_slave
Running command self.smart_module.on_query_status()
STATUS/QUERY I might need to know how you are!
Running command self.smart_module.on_check_alert()

After the disconnected, it seems to connect again but there is no answer to queries.

Here you can see how the Free Memory is dropping constantly. image

And here you can see how the Used Memory is quite stable. image

It could be Cached Memory, but the growing doesn't match with the drop in Free. image

PedroSFreitas avatar May 14 '17 13:05 PedroSFreitas

After weeks of testing, tweaking and testing, we believe the primary leak is in the schedule library. Another leak may occur after unintended disconnects from the MQTT broker. Looking at another scheduling package.

TylerReedMC avatar May 24 '17 12:05 TylerReedMC

What is the smallest program that can provoke the problem? 10 to 30 lines is a good size for a demo.

james-prior avatar May 24 '17 23:05 james-prior

Have you run the system using pdb? This is an example analysis:

(Pdb) objgraph.show_most_common_types(limit=20)
dict                       378631
list                       184791
builtin_function_or_method 57542
tuple                      55478
Message                    48129
function                   45575
instancemethod             31949
NonBlockingSocket          31876
NonBlockingConnection      31876
_socketobject              31876
_Condition                 28320
AMQPReader                 14900
cell                       9678

moritz89 avatar Dec 21 '17 09:12 moritz89

@moritz89 to be honest it has been quite some time that I tested it. I believe the best now would be a full new test.

What would be really best, in my humble opinion, is a re-write of the smart module part.

PedroSFreitas avatar Dec 21 '17 10:12 PedroSFreitas