cobrapy
cobrapy copied to clipboard
memory leaks when creating models
Problem description
Tried to debug it for weeks now, but never found a fix. So reporting it here for now...
The following code will slowly consume all memory on my machine:
from cobra.test import create_test_model
import gc
mod = create_test_model()
mod.solver = "glpk" # same behavior for "cplex" and "gurobi"
for i in range(10000):
mod = mod.copy()
gc.collect()
Expected behavior would be that the code uses constant RAM since objects are deleted and cleaned every iteration. This is not related to the reported GLPK memory leak in optlang since I used my already fixed branch of optlang. Also it applies to all solver interfaces. I also tried loading models from pickled versions in the loop or creating them directly instead of copying. All with the same effect.
Code Sample
Most minimal version:
from cobra.test import create_test_model
for i in range(10000):
mod = create_test_model()
Actual Output
Monitoring the RAM usage it will linearly increase with time.
Expected Output
RAM usage should be constant.
Output of cobra.show_versions()
System Information
OS Linux OS-release 4.11.10-300.fc26.x86_64 Python 3.6.2
Package Versions
pip 9.0.1 setuptools 36.2.4 cobra 0.8.1 future 0.16.0 swiglpk 1.4.3 optlang 1.2.1 ruamel.yaml 0.14.12 pandas 0.20.3 numpy 1.13.1 tabulate 0.7.7 python-libsbml 5.15.0 lxml 3.7.2 scipy 0.19.1 matplotlib 2.0.2
I'm currently probing when this leak was first introduced. I installed cobra 0.6.1 and observe the same behaviour. Interestingly when I try to interrupt the above loop, what comes up a lot is the following:
^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f125a061cf0> >>
Traceback (most recent call last):
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
__del__ = lambda self: None
KeyboardInterrupt
^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f1259b167e0> >>
Traceback (most recent call last):
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
__del__ = lambda self: None
KeyboardInterrupt
^C^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f12596242a0> >>
Traceback (most recent call last):
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
__del__ = lambda self: None
KeyboardInterrupt
^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f125921fd20> >>
Traceback (most recent call last):
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
__del__ = lambda self: None
I will try now with a pre-optlang version.
Alright, does not occur with 0.5.9 but does occur with 0.6.0 so it's definitely an optlang problem.
Using the cplex interface and cobra 0.6.0 the code is often interrupted at the following point.
Traceback (most recent call last):
File "leak.py", line 12, in <module>
mod = mod.copy()
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/cobra/core/model.py", line 314, in copy
new._solver = deepcopy(self.solver)
File "/home/moritz/.virtualenvs/membug/lib/python3.5/copy.py", line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/moritz/.virtualenvs/membug/lib/python3.5/copy.py", line 299, in _reconstruct
y.__setstate__(state)
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 706, in __setstate__
self.__init__(problem=problem)
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 603, in __init__
var = Variable(name, lb=lb, ub=ub, problem=self) # Type should also be in there
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/interface.py", line 171, in __new__
obj.name = name
File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 214, in name
if getattr(self, "problem", None) is not None:
KeyboardInterrupt
using pympler this comparison of 100 iterations versus 500 iterations hints to issue with the optlang Container class..
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:48<00:00, 2.58it/s]
types | # objects | total size
=========================================== | =========== | ============
<class 'set | 514814 | 203.61 MB
<class 'dict | 1213524 | 175.16 MB
<class 'sympy.core.facts.FactKB | 509200 | 124.39 MB
<class 'str | 726840 | 43.07 MB
<class 'optlang.glpk_interface.Variable | 509200 | 42.73 MB
<class 'float | 1215309 | 27.82 MB
<class 'uuid.UUID | 509200 | 27.19 MB
<class 'optlang.glpk_interface.Constraint | 180200 | 9.62 MB
<class 'list | 24506 | 7.72 MB
<class 'sympy.core.assumptions.StdFactKB | 1201 | 906.12 KB
<class 'tuple | 2658 | 184.93 KB
<class 'int | 5976 | 178.77 KB
<class 'cobra.core.reaction.Reaction | 2546 | 139.23 KB
<class 'type | 100 | 118.09 KB
<class 'cobra.core.metabolite.Metabolite | 1802 | 98.55 KB
(py36) ➜ cobrapy git:(fix/sdist) ✗ qe bug.py
(py36) ➜ cobrapy git:(fix/sdist) ✗ python bug.py
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [25:54<00:00, 2.64it/s]
types | # objects | total size
=========================================== | =========== | ============
<class 'set | 642114 | 254.10 MB
<class 'dict | 1513768 | 217.82 MB
<class 'sympy.core.facts.FactKB | 636500 | 155.49 MB
<class 'optlang.glpk_interface.Variable | 636500 | 53.42 MB
<class 'str | 899234 | 53.22 MB
<class 'float | 1514952 | 34.67 MB
<class 'uuid.UUID | 636500 | 33.99 MB
<class 'optlang.glpk_interface.Constraint | 225250 | 12.03 MB
<class 'list | 24730 | 9.10 MB
<class 'sympy.core.assumptions.StdFactKB | 1499 | 1.10 MB
<class 'tuple | 3352 | 233.48 KB
<class 'int | 6574 | 198.74 KB
<class 'type | 125 | 147.00 KB
<class 'cobra.core.reaction.Reaction | 2546 | 139.23 KB
function (<lambda>) | 750 | 99.61 KB
Just copying mod.solver seems create a leak of the same magnitude but copying reactions or metabolites does not do it seems to be related to optlang, however it is weird it affects all interfaces.
Did you manage to reproduce it using only optlang?
Never in that magnitude. I did manage by pickling the solver object of a cplex cobra model and repeatedly unpickling it with pure optlang.
Has a fix been found for this problem
Can report that i have also been having this issue using gurobi as the solver.
Output of cobra.versions()
System Information
OS Linux OS-release 4.4.0-83-generic Python 2.7.12
Package Versions
pip 9.0.1 setuptools 29.0.1 cobra 0.8.2 future 0.16.0 swiglpk 1.4.4 optlang 1.2.1 ruamel.yaml 0.14.12 pandas 0.20.3 numpy 1.13.1 tabulate 0.7.7 python-libsbml 5.15.0 lxml 3.5.0 scipy 0.17.0 matplotlib 1.5.1
So seems to be pretty complicated and probably requires some large changes in optlang that are in the works. However, you can work around that problem by running each iteration in its own process using Process from the multiprocessing module. If you can tell us a bit more what you are trying to do I can post a snippet using this strategy :smile:
I am trying to obtain the flux profile using parsimonious fba for every possible double reaction addition and deletion to the Ecoli iJO1366 model. Some deletions that cause the model to be infeasible can result in issues downstream in my code in subsequent optimisation so as a quick fix, every time i delete or add a new reaction to the original model i restore the original model by copying from a backup. i.e by repeatedly calling
mod = mod_backup.copy().
Now in total there are roughly 10 million possible double 'mutants' and for each one i copy a model. Even if i divide that up on an HPC into 2000 runs of 5000 it' still runs into the cpu memory limit. I get around this by only doing 50 optimization per script run , but it's a bit of an inefficient hack.
An example work around for the minimal case you gave would be appreciated.
@vilacelestin for this you may consider instead using the context manager to get changes auto-reverted, e.g. something like
for rxn1, rxn2 in tuples_of_two_reactions:
with model:
rxn1.knock_out()
rxn2.knock_out()
sol = pfba(model)
...
# when exiting the context manager model is back to what it was before
this way you don't have to copy the model at all (which is also a very slow operation), hence avoiding the memory leak bug
Hi,
could you give some details on
Some deletions that cause the model to be infeasible can result in issues downstream in my code
Do you mean that having some prior reverted deletion operations you get incorrect results later on? If yes that would be a bug of its own. I agree with @hredestig that the context manager should be the best solution here...
If you are affected by this bug the following is a temporal workaround: use a Process for each iteration. Here is an example that also runs in parallel:
from cobra.test import create_test_model
from multiprocessing import Process, Queue
max_procs = 6
processes = []
def example_task(id, queue):
mod = create_test_model("ecoli")
res = mod.slim_optimize()
queue.put((name, res))
def consume(queue):
global processes
results = []
for p in processes:
results.append(queue.get())
for p in processes:
p.join()
processes = []
return results
q = Queue()
for i in range(100):
p = Process(target=example_task, args=(i, q))
p.start()
processes.append(p)
if len(processes) >= max_procs:
print(consume(q))
print(consume(q))
So, it looks like memory is leaking in multiple places (especially through the sympy cache) and differently for the different solver backends. For example, setting SYMPY_USE_CACHE=no as an environmental variable (inactivating the sympy cache) seems to eliminate memory leakage completely for cplex, while it seems that in the GLPK case there is still some memory leak that seems to be related to the setstate and getstate methods implemented to make models pickleable. Unfortunately, using symengine seems to create a third leak.
Is that already with weakrefs? I would expect symengine to leak if you have it in cyclic dependencies like they exist in optlang...
I am a little out of this, so I am not sure what weakrefs refers to, but it seems like memory accumulation is something expected in sympy and suggested best practice is to clear the cache? https://github.com/sympy/sympy/issues/6321#issuecomment-37005578 Every variable/symbol that is created when a model is copied is probably never going to be garbage collected because it is in the cache (mainly guessing here)? Edit: seems like this indeed the problem -> https://github.com/sympy/sympy/issues/6321#issuecomment-37005580
When we discussed the GLPK issues the problem is that swiglpk allocates memory for a new GLPK problem that is never cleared up since there was no call to glp_delete. However, you can not just add a call to glp_delete in __del__ since that way the cyclic dependencies in optlang will not be resolved and the objects never deleted by the garbage collector (since that only works if you don't overwrite __del__). For symengine I could imagine a similar case where Python sees that the symengine objects have custom __del__ methods and therefore does no garbage collection for the optlang objects (that is just a wild guess). If you use weakrefs you can break those cycles. Basically you can tell an optlang model that it can be deleted even though its own variables still hold backreferences to it (also see https://github.com/biosustain/optlang/issues/128).
The memory leak caused by the sympy cache can be fixed without disabling the cache by setting the SYMPY_CACHE_SIZE environment variable (https://github.com/sympy/sympy/pull/7464). This will limit the cache to the set size using a Least Recently Used policy for throwing away items. If a lot of copying cannot be avoided this would probably be worth doing and would not sacrifice as much speed as disabling the cache completely.
Now tracked in https://github.com/opencobra/optlang/issues/128 .