PyROOT memory leak reading TMap from a file
- [X] Checked for duplicates
Describe the bug
We have a long-running service which reads some information from ROOT files and we have noticed that this service regularly grows its memory use until it consumes all available memory and gets killed. I think I have tracked it down to reading one particular object type TMap. We have relatively large TMap object stored in files, the key in TMap is TObjString and values are TObjArray of TObjString. Every time when TMap is read from a file the memory use grows dramatically by many megabytes, even though the Python object is discarded almost immediately after use.
Expected behavior
Memory use should not grow indefinitely, memory should be deallocated when Python object is destroyed.
To Reproduce
I managed to reproduce this issue with a simple script, attached here:
import os
import tempfile
import psutil
import ROOT
def main():
proc = psutil.Process()
with tempfile.TemporaryDirectory() as tmpdir:
file_name = os.path.join(tmpdir, "testfile.root")
print("Creating test root file", file_name)
_make_file_with_tmap(file_name)
mem0 = proc.memory_info()
rootfile = ROOT.TFile.Open(file_name, "READ")
while True:
ann_obj = rootfile.Get("annotations")
# ann_obj.DeleteAll() # DeleteAll seems to slow leak but not 100%
del ann_obj
mem1 = proc.memory_info()
_memstat(mem0, mem1)
mem0 = mem1
def _make_file_with_tmap(file_name):
rootfile = ROOT.TFile.Open(file_name, "RECREATE")
map = ROOT.TMap()
for i in range(50_000):
key = ROOT.TObjString("long_histogram_name_" * 10 + str(i))
value = ROOT.TObjArray()
value.Add(ROOT.TObjString("annotation"))
value.Add(ROOT.TObjString(f"10000"))
map.Add(key, value)
rootfile.WriteTObject(map, "annotations")
rootfile.Close()
def _memstat(mem0, mem1):
MB = 1024 * 1024
rss_mb = mem1.rss // MB
vms_mb = mem1.vms // MB
rss_delta_mb = (mem1.rss - mem0.rss) / MB
vms_delta_mb = (mem1.vms - mem0.vms) / MB
print(f"Memory stat: RSS={rss_mb} MB, VMS={vms_mb} MB. Delta RSS={rss_delta_mb:.2f} MB VMS={vms_delta_mb:.2f} MB")
if __name__ == "__main__":
main()
just setup ROOT from a LCG release and run it, it will print memory use on each iteration, for me it grows by ~35MB on each loop.
Setup
Tested this with ROOT 6.24.06 from LCG_101 and ROOT 6.26.04 from LCG_102 on CentOS7 host (x86_64-centos7-gcc11-opt).
Additional context
Dear @andy-slac ,
Thank you for reaching out and for the easy reproducer!
Indeed, I also see the memory leak with the code you posted. The issue seems to be with TObjArray rather than with TMap itself. In fact, the documentation says that
By default the TObjArray does not own the objects it points to and will not delete them unless explicitly asked (via a call to the Delete member function). To assign ownership of the content to the array, call: myarr->SetOwner(kTRUE);
Thus, if I modify slightly your reproducer as follows:
import os
import tempfile
import psutil
import ROOT
def main():
proc = psutil.Process()
with tempfile.TemporaryDirectory() as tmpdir:
file_name = os.path.join(tmpdir, "testfile.root")
print("Creating test root file", file_name)
_make_file_with_tmap(file_name)
mem0 = proc.memory_info()
rootfile = ROOT.TFile.Open(file_name, "READ")
while True:
ann_obj = rootfile.Get("annotations")
ann_obj.DeleteAll() # This deletes the contents of the current map from memory
mem1 = proc.memory_info()
_memstat(mem0, mem1)
mem0 = mem1
def _make_file_with_tmap(file_name):
rootfile = ROOT.TFile.Open(file_name, "RECREATE")
map = ROOT.TMap()
for i in range(50_000):
key = ROOT.TObjString("long_histogram_name_" * 10 + str(i))
value = ROOT.TObjArray()
value.Add(ROOT.TObjString("annotation"))
value.Add(ROOT.TObjString(f"10000"))
value.SetOwner(True) # This makes sure that all the `TObjString`s are owned by the `TObjArray`
map.Add(key, value)
rootfile.WriteTObject(map, "annotations")
rootfile.Close()
def _memstat(mem0, mem1):
MB = 1024 * 1024
rss_mb = mem1.rss // MB
vms_mb = mem1.vms // MB
rss_delta_mb = (mem1.rss - mem0.rss) / MB
vms_delta_mb = (mem1.vms - mem0.vms) / MB
print(f"Memory stat: RSS={rss_mb} MB, VMS={vms_mb} MB. Delta RSS={rss_delta_mb:.2f} MB VMS={vms_delta_mb:.2f} MB")
if __name__ == "__main__":
main()
I get the following output
Creating test root file /tmp/tmp8vyocdk_/testfile.root
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=77.00 MB VMS=76.89 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
Memory stat: RSS=325 MB, VMS=1220 MB. Delta RSS=0.00 MB VMS=0.00 MB
[...]
TLDR: The TObjArrays that you were writing to the file did not own the TObjStrings you were giving them. When reading them back, the TObjStrings were leaked into memory because they were not owned by anybody.
Can you confirm that you see the same results as me?
@vepadulano, thanks for suggestion. Does it mean that SetOwner have to be called before writing object to a file? The problem appears when I read data produced by other application, should not ROOT always set ownership for the data which is read from a file? For Python it is quite unnatural to explicitly manage resources, everything should be deallocated when Python object is deleted.
The problem appears when I read data produced by other application
I see, does the other application produce data like in your reproducer above? If so, there is a bug in that application, it doesn't follow the documentation that I linked above. I would suggest to open an issue with the developers in that case.
should not ROOT always set ownership for the data which is read from a file?
I understand the confusion, for the case of objects that were stored on disk it might be better to already have ownership of everything inside of them. But, it depends on the type of data and most importantly on the particular user application. The behaviour of ROOT object ownership may not always be the expected one, but is documented in the manual and in the relevant class references. There are applications that rely on that so it can't always be changed.
For Python it is quite unnatural to explicitly manage resources, everything should be deallocated when Python object is deleted.
I can't say I completely agree with this sentence. Let me break it down:
- everything should be deallocated when Python object is deleted: do you mean when you call
del obj? Because in general that is not true, a Python object is deleted/deallocated when the garbage collector decides it is the right time. Then, here you don't have pure Python objects, but there are also C++ objects underneath. ROOT takes care of lifetime management of the object it knows about. In the code above the objects are not properly registered. - For Python it is quite unnatural to explicitly manage resources: in most cases I would agree. There are some notable exceptions, for example when you manage file I/O or even more when you manage databases (the standard Python
sqlite3library comes to mind). In the case ofTFile, the semantics and the behaviour are closer to managing access to a database than reading a text file.
Finally, let me highlight that usage of TObjArray and such is discouraged nowadays, when possible it would be better to migrate to std::vector or similar.
Hi @vepadulano,
I think I have to disagree on all counts. The application that writes the data to a file may have it's own memory management policy and it should not be required for it to always use SetOwner on its data. Memory management in the reading application should be responsibility of the reading application, not writing one.
Your examples of sqlite3 or Python files are misplaced, none of those is leaking memory like ROOT. Again, my point is that whatever Python extension does, it is not supposed to leak memory or require some special code to avoid memory leak. I do understand how Python garbage collection works but that has nothing to do with the memory leak in this case.
And of course it is not always possible to migrate from TObjArray to vector, the files are already written and archived for posteriority, no one is going to update or replace those files.
Hi @andy-slac!
Does it mean that
SetOwnerhave to be called before writing object to a file?
No, you can also do that at any time, also after opening:
ann_obj = rootfile.Get("annotations")
for key in ann_obj:
ann_obj.GetValue(key).SetOwner(True)
The problem appears when I read data produced by other application, should not ROOT always set ownership for the data which is read from a file?
Yes, and it does that indeed! The problem is that there is nested data inside the TMap that is read from the file: the TObjArray that points to some elements. PyROOT takes care of cleaning the TMaps for you. But the actual leak happens because the TObjArray on the C++ side, which has no idea that it should delete the elements it points to unless you tell it to with SetOwner() (yeah manual memory management in C++ is also not good, but these are old classes that can't be changed anymore because they are written to many existing files, as you say yourself).
For Python it is quite unnatural to explicitly manage resources, everything should be deallocated when Python object is deleted.
That's right, but when the underlying C++ objects have no clue what they should do at deletion time because ROOTs memory management is easy to get wrong, PyROOT has not a remote chance to do the right thing.
Closing this because it's not a PyROOT issue, just a user error (a very understandable one, nobody should expect to read the class docs just to get the memory management right :slightly_smiling_face: but hey, that's C++ from the 90s).