Memory leak in long running process
I swapped out heapq.merge for this on a continuously running process. A new call to mulitmerge.merge would be made often it was passed a *list of generators with a key. The instance of the multimerge.merge iterator would not be exhausted it would be used in a loop until a break and just be garbage collected and created anew in a future loop. This would slowly leak memory. Switching back to heapq.merge resolved it. Either something is not deleted In the C or the python gc is not able to fully clean up the multimerge instance when it removes it.
Can you provide a minimal reproducible example? Maybe use psutil to show memory usage over time.
I tested with the script below (key/no-key,forward/reverse,exhausting/stopping,various lengths,various numbers of iterables) and could not find good evidence of a leak. Since dynamic memory allocation is used, occasionally there is fragmentation that requires allocating new pages, but not in any consistent way: memory usage fluctuates, but is ultimately bounded. It may just be that multimerge.merge uses more dynamic memory allocation, so there is naturally more fragmentation and slight variations in memory usage over time.
I suppose another possibility is that multimerge.merge is holding onto some references slightly longer that heapq.merge does. For example, I believe the computed key(x) values might be preserved for one more iteration, which is within the realm of reasonable implementation details.
On the other hand, if you can demonstrate that something is systematically and permanently leaking, that would certainly be a bug worth fixing, but I can't know without a reproducer.
measure_memory.py
import random
from itertools import product
from operator import itemgetter
if 1:
print("using multimerge.merge")
from multimerge import merge
if 0:
print("using heapq.merge")
from heapq import merge
lengths = range(20)
keys = [None, itemgetter(0), itemgetter(1), itemgetter(1, 0)]
reverses = [False, True]
repeats = range(10)
pool = list(product('ABC', (-500, 500)))
def make_data():
data = []
for n, key, reverse, _ in product(lengths, keys, reverses, repeats):
inputs = []
for _ in range(n):
it = random.choices(pool, k=random.randrange(20))
it.sort(key=key, reverse=reverse)
inputs.append(it)
data.append((inputs, key, reverse))
return data
def check_all(data):
for inputs, key, reverse in data:
for take in [1, 2, 5, 10, 20, 50, 100]:
m = merge(*inputs, key=key, reverse=reverse)
for _ in range(take):
next(m, None)
m = merge(*inputs, key=key, reverse=reverse)
list(m)
list(m)
del m
def main():
if 1:
print("measuring memory usage")
import psutil
process = psutil.Process()
get_stat = lambda: process.memory_info().rss
if 0:
print("measuring refcounts")
# needs debug build of python
from sys import gettotalrefcount
get_stat = gettotalrefcount
prev_stat = 0
for _ in range(1000):
check_all(make_data())
stat = get_stat()
if stat == prev_stat:
print(".", end='', flush=True)
else:
print(f"\n{stat} (+{stat-prev_stat})")
prev_stat = stat
print("\ndone")
if __name__ == "__main__":
main()
output.txt
using multimerge.merge
measuring memory usage
23158784 (+23158784)
.........
23969792 (+811008)
....................................................................................................................................................
22974464 (+-995328)
....
23781376 (+806912)
...............................................................................................................................................................
22974464 (+-806912)
.....
23781376 (+806912)
.....................................
22974464 (+-806912)
23240704 (+266240)
...
23781376 (+540672)
..................................................................................
22974464 (+-806912)
....
23781376 (+806912)
.......................................................................
22974464 (+-806912)
23240704 (+266240)
...
23781376 (+540672)
...........................
22974464 (+-806912)
....
23781376 (+806912)
................
22974464 (+-806912)
....
23781376 (+806912)
.....
22974464 (+-806912)
....
23781376 (+806912)
.....
22974464 (+-806912)
....
23781376 (+806912)
..................................................................................
22974464 (+-806912)
....
23781376 (+806912)
.......................................................................
22974464 (+-806912)
....
23781376 (+806912)
...........................................................................................................................................................................................................................
done