cobrapy icon indicating copy to clipboard operation
cobrapy copied to clipboard

memory leaks when creating models

Open cdiener opened this issue 8 years ago • 18 comments

Problem description

Tried to debug it for weeks now, but never found a fix. So reporting it here for now...

The following code will slowly consume all memory on my machine:

from cobra.test import create_test_model
import gc

mod = create_test_model()
mod.solver = "glpk"    # same behavior for "cplex" and "gurobi"

for i in range(10000):
    mod = mod.copy()
    gc.collect()

Expected behavior would be that the code uses constant RAM since objects are deleted and cleaned every iteration. This is not related to the reported GLPK memory leak in optlang since I used my already fixed branch of optlang. Also it applies to all solver interfaces. I also tried loading models from pickled versions in the loop or creating them directly instead of copying. All with the same effect.

Code Sample

Most minimal version:

from cobra.test import create_test_model

for i in range(10000):
    mod = create_test_model()

Actual Output

Monitoring the RAM usage it will linearly increase with time.

Expected Output

RAM usage should be constant.

Output of cobra.show_versions()

System Information

OS Linux OS-release 4.11.10-300.fc26.x86_64 Python 3.6.2

Package Versions

pip 9.0.1 setuptools 36.2.4 cobra 0.8.1 future 0.16.0 swiglpk 1.4.3 optlang 1.2.1 ruamel.yaml 0.14.12 pandas 0.20.3 numpy 1.13.1 tabulate 0.7.7 python-libsbml 5.15.0 lxml 3.7.2 scipy 0.19.1 matplotlib 2.0.2

cdiener avatar Aug 07 '17 20:08 cdiener

I'm currently probing when this leak was first introduced. I installed cobra 0.6.1 and observe the same behaviour. Interestingly when I try to interrupt the above loop, what comes up a lot is the following:

^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f125a061cf0> >>
Traceback (most recent call last):
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
    __del__ = lambda self: None
KeyboardInterrupt
^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f1259b167e0> >>
Traceback (most recent call last):
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
    __del__ = lambda self: None
KeyboardInterrupt
^C^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f12596242a0> >>
Traceback (most recent call last):
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
    __del__ = lambda self: None
KeyboardInterrupt
^CException ignored in: <bound method glp_smcp.<lambda> of <swiglpk.swiglpk.glp_smcp; proxy of <Swig Object of type 'glp_smcp *' at 0x7f125921fd20> >>
Traceback (most recent call last):
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/swiglpk/swiglpk.py", line 281, in <lambda>
    __del__ = lambda self: None

I will try now with a pre-optlang version.

Midnighter avatar Aug 07 '17 21:08 Midnighter

Alright, does not occur with 0.5.9 but does occur with 0.6.0 so it's definitely an optlang problem.

Midnighter avatar Aug 07 '17 21:08 Midnighter

Using the cplex interface and cobra 0.6.0 the code is often interrupted at the following point.

Traceback (most recent call last):
  File "leak.py", line 12, in <module>
    mod = mod.copy()
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/cobra/core/model.py", line 314, in copy
    new._solver = deepcopy(self.solver)
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/copy.py", line 299, in _reconstruct
    y.__setstate__(state)
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 706, in __setstate__
    self.__init__(problem=problem)
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 603, in __init__
    var = Variable(name, lb=lb, ub=ub, problem=self)  # Type should also be in there
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/interface.py", line 171, in __new__
    obj.name = name
  File "/home/moritz/.virtualenvs/membug/lib/python3.5/site-packages/optlang/cplex_interface.py", line 214, in name
    if getattr(self, "problem", None) is not None:
KeyboardInterrupt

Midnighter avatar Aug 07 '17 21:08 Midnighter

using pympler this comparison of 100 iterations versus 500 iterations hints to issue with the optlang Container class..

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:48<00:00,  2.58it/s]
                                      types |   # objects |   total size
=========================================== | =========== | ============
                                <class 'set |      514814 |    203.61 MB
                               <class 'dict |     1213524 |    175.16 MB
            <class 'sympy.core.facts.FactKB |      509200 |    124.39 MB
                                <class 'str |      726840 |     43.07 MB
    <class 'optlang.glpk_interface.Variable |      509200 |     42.73 MB
                              <class 'float |     1215309 |     27.82 MB
                          <class 'uuid.UUID |      509200 |     27.19 MB
  <class 'optlang.glpk_interface.Constraint |      180200 |      9.62 MB
                               <class 'list |       24506 |      7.72 MB
   <class 'sympy.core.assumptions.StdFactKB |        1201 |    906.12 KB
                              <class 'tuple |        2658 |    184.93 KB
                                <class 'int |        5976 |    178.77 KB
       <class 'cobra.core.reaction.Reaction |        2546 |    139.23 KB
                               <class 'type |         100 |    118.09 KB
   <class 'cobra.core.metabolite.Metabolite |        1802 |     98.55 KB
(py36) ➜  cobrapy git:(fix/sdist) ✗ qe bug.py
(py36) ➜  cobrapy git:(fix/sdist) ✗ python bug.py
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [25:54<00:00,  2.64it/s]
                                      types |   # objects |   total size
=========================================== | =========== | ============
                                <class 'set |      642114 |    254.10 MB
                               <class 'dict |     1513768 |    217.82 MB
            <class 'sympy.core.facts.FactKB |      636500 |    155.49 MB
    <class 'optlang.glpk_interface.Variable |      636500 |     53.42 MB
                                <class 'str |      899234 |     53.22 MB
                              <class 'float |     1514952 |     34.67 MB
                          <class 'uuid.UUID |      636500 |     33.99 MB
  <class 'optlang.glpk_interface.Constraint |      225250 |     12.03 MB
                               <class 'list |       24730 |      9.10 MB
   <class 'sympy.core.assumptions.StdFactKB |        1499 |      1.10 MB
                              <class 'tuple |        3352 |    233.48 KB
                                <class 'int |        6574 |    198.74 KB
                               <class 'type |         125 |    147.00 KB
       <class 'cobra.core.reaction.Reaction |        2546 |    139.23 KB
                        function (<lambda>) |         750 |     99.61 KB

hredestig avatar Aug 08 '17 16:08 hredestig

Just copying mod.solver seems create a leak of the same magnitude but copying reactions or metabolites does not do it seems to be related to optlang, however it is weird it affects all interfaces.

cdiener avatar Aug 08 '17 17:08 cdiener

Did you manage to reproduce it using only optlang?

hredestig avatar Aug 08 '17 20:08 hredestig

Never in that magnitude. I did manage by pickling the solver object of a cplex cobra model and repeatedly unpickling it with pure optlang.

cdiener avatar Aug 08 '17 21:08 cdiener

Has a fix been found for this problem

Can report that i have also been having this issue using gurobi as the solver.

Output of cobra.versions()

System Information

OS Linux OS-release 4.4.0-83-generic Python 2.7.12

Package Versions

pip 9.0.1 setuptools 29.0.1 cobra 0.8.2 future 0.16.0 swiglpk 1.4.4 optlang 1.2.1 ruamel.yaml 0.14.12 pandas 0.20.3 numpy 1.13.1 tabulate 0.7.7 python-libsbml 5.15.0 lxml 3.5.0 scipy 0.17.0 matplotlib 1.5.1

jccvila avatar Sep 15 '17 18:09 jccvila

So seems to be pretty complicated and probably requires some large changes in optlang that are in the works. However, you can work around that problem by running each iteration in its own process using Process from the multiprocessing module. If you can tell us a bit more what you are trying to do I can post a snippet using this strategy :smile:

cdiener avatar Sep 17 '17 15:09 cdiener

I am trying to obtain the flux profile using parsimonious fba for every possible double reaction addition and deletion to the Ecoli iJO1366 model. Some deletions that cause the model to be infeasible can result in issues downstream in my code in subsequent optimisation so as a quick fix, every time i delete or add a new reaction to the original model i restore the original model by copying from a backup. i.e by repeatedly calling

mod = mod_backup.copy().

Now in total there are roughly 10 million possible double 'mutants' and for each one i copy a model. Even if i divide that up on an HPC into 2000 runs of 5000 it' still runs into the cpu memory limit. I get around this by only doing 50 optimization per script run , but it's a bit of an inefficient hack.

An example work around for the minimal case you gave would be appreciated.

jccvila avatar Sep 17 '17 22:09 jccvila

@vilacelestin for this you may consider instead using the context manager to get changes auto-reverted, e.g. something like

for rxn1, rxn2 in tuples_of_two_reactions:
    with model:
        rxn1.knock_out()
        rxn2.knock_out()
        sol = pfba(model)
        ...
   # when exiting the context manager model is back to what it was before

this way you don't have to copy the model at all (which is also a very slow operation), hence avoiding the memory leak bug

hredestig avatar Sep 18 '17 12:09 hredestig

Hi,

could you give some details on

Some deletions that cause the model to be infeasible can result in issues downstream in my code

Do you mean that having some prior reverted deletion operations you get incorrect results later on? If yes that would be a bug of its own. I agree with @hredestig that the context manager should be the best solution here...

cdiener avatar Sep 18 '17 15:09 cdiener

If you are affected by this bug the following is a temporal workaround: use a Process for each iteration. Here is an example that also runs in parallel:

from cobra.test import create_test_model
from multiprocessing import Process, Queue

max_procs = 6
processes = []

def example_task(id, queue):
    mod = create_test_model("ecoli")
    res = mod.slim_optimize()
    queue.put((name, res))

def consume(queue):
    global processes
    results = []
    for p in processes:
        results.append(queue.get())
    for p in processes:
        p.join()
    processes = []
    return results

q = Queue()
for i in range(100):
    p = Process(target=example_task, args=(i, q))
    p.start()
    processes.append(p)
    if len(processes) >= max_procs:
        print(consume(q))
print(consume(q))

cdiener avatar Sep 18 '17 15:09 cdiener

So, it looks like memory is leaking in multiple places (especially through the sympy cache) and differently for the different solver backends. For example, setting SYMPY_USE_CACHE=no as an environmental variable (inactivating the sympy cache) seems to eliminate memory leakage completely for cplex, while it seems that in the GLPK case there is still some memory leak that seems to be related to the setstate and getstate methods implemented to make models pickleable. Unfortunately, using symengine seems to create a third leak.

phantomas1234 avatar Oct 16 '18 00:10 phantomas1234

Is that already with weakrefs? I would expect symengine to leak if you have it in cyclic dependencies like they exist in optlang...

cdiener avatar Oct 16 '18 00:10 cdiener

I am a little out of this, so I am not sure what weakrefs refers to, but it seems like memory accumulation is something expected in sympy and suggested best practice is to clear the cache? https://github.com/sympy/sympy/issues/6321#issuecomment-37005578 Every variable/symbol that is created when a model is copied is probably never going to be garbage collected because it is in the cache (mainly guessing here)? Edit: seems like this indeed the problem -> https://github.com/sympy/sympy/issues/6321#issuecomment-37005580

phantomas1234 avatar Oct 16 '18 21:10 phantomas1234

When we discussed the GLPK issues the problem is that swiglpk allocates memory for a new GLPK problem that is never cleared up since there was no call to glp_delete. However, you can not just add a call to glp_delete in __del__ since that way the cyclic dependencies in optlang will not be resolved and the objects never deleted by the garbage collector (since that only works if you don't overwrite __del__). For symengine I could imagine a similar case where Python sees that the symengine objects have custom __del__ methods and therefore does no garbage collection for the optlang objects (that is just a wild guess). If you use weakrefs you can break those cycles. Basically you can tell an optlang model that it can be deleted even though its own variables still hold backreferences to it (also see https://github.com/biosustain/optlang/issues/128).

cdiener avatar Oct 16 '18 22:10 cdiener

The memory leak caused by the sympy cache can be fixed without disabling the cache by setting the SYMPY_CACHE_SIZE environment variable (https://github.com/sympy/sympy/pull/7464). This will limit the cache to the set size using a Least Recently Used policy for throwing away items. If a lot of copying cannot be avoided this would probably be worth doing and would not sacrifice as much speed as disabling the cache completely.

KristianJensen avatar Oct 17 '18 13:10 KristianJensen

Now tracked in https://github.com/opencobra/optlang/issues/128 .

cdiener avatar Nov 04 '22 19:11 cdiener