sonic-swss-common icon indicating copy to clipboard operation
sonic-swss-common copied to clipboard

Fix swig template memory leak

Open praveenraja1 opened this issue 1 year ago • 0 comments

Problem:

Device gets stuck if left switched on for prolonged period of time for more than 30 hrs.

Observation:

Pmon docker seems to be leaking memory over time, leading to system wide out-of-memory condition. Python process in pmon such as psud, thermalctld were all leaking memory over time. The issue was traced back to table.set() operations that were done in these python process. We have swig template files which are used to generate the code mappings for python to cpp conversions.

In these constructor functions, we have allocated new references using PySequence_GetItem(), which these references were not marked with Py_DECREF, when done using them. This was causing stale objects to present which was leaking memory over time. /usr/lib/python3/dist-packages/swsscommon/swsscommon.py:184: size=2851 B, count=49, average=58 B /usr/lib/python3.9/tracemalloc.py:505: size=1456 B, count=24, average=61 B /usr/lib/python3.9/tracemalloc.py:67: size=1344 B, count=21, average=64 B /usr/lib/python3.9/tracemalloc.py:558: size=1224 B, count=24, average=51 B /usr/lib/python3.9/tracemalloc.py:498: size=1104 B, count=23, average=48 B /usr/lib/python3.9/tracemalloc.py:193: size=768 B, count=16, average=48 B /usr/lib/python3.9/tracemalloc.py:533: size=576 B, count=1, average=576 B /usr/lib/python3.9/threading.py:574: size=536 B, count=1, average=536 B

Fix:

Used Py_DECREF in constructor functions.

Unit test:

Tested using a custom py script which does table.set in while loop (and checking tracemalloc) to ensure memory leak doesnt happen.

Allowed the system to be up for sometime to ensure memory usage of pmon docker doesnt increase like earlier.

Check show platform fan/psu to ensure that pmon functionality isnt affected due to this change.

praveenraja1 avatar Feb 22 '24 10:02 praveenraja1