linopy icon indicating copy to clipboard operation
linopy copied to clipboard

Serializing and deserializing linopy.Model

Open tburandt opened this issue 1 year ago • 4 comments

Hi,

I am currently exploring/trying to setup larger models in parallel (in individual processes) and pass them back to the main process. Because the individual models are fairly large but can be prepared individually and largely independend from each other. Later on linked specific instances are linked through a few additional constraints.

However, although serializing into a pickle or dill works fine, when trying to serialize the pickle again, a recursion error is thrown and therefore, ProcessPoolExecutor cannot be used to prepare models in parallel. (I.e., ProcessPoolExecutor uses serialization to hand over data from one process to another) This can be easily checked with this example:

import dill
import pandas as pd

import linopy
import pickle

m = linopy.Model()
time = pd.Index(range(10), name="time")

x = m.add_variables(
    lower=0,
    coords=[time],
    name="x",
) # to be done in parallel process
y = m.add_variables(lower=0, coords=[time], name="y") # to be done in parallel process

factor = pd.Series(time, index=time) # to be done in parallel process

con1 = m.add_constraints(3 * x + 7 * y >= 10 * factor, name="con1") # to be done in parallel process
con2 = m.add_constraints(5 * x + 2 * y >= 3 * factor, name="con2") # to be done in parallel process

m.add_objective(x + 2 * y) # to be done in parallel process

with open("test.pkl", 'wb') as f:
    dill.dump(m, f)

with open("test.pkl", 'rb') as f:
    m2 = dill.load(f)

x.lower = 1 # or add whatever additional constraint
m.solve()

Which throws the following error:

Traceback (most recent call last):
  File "C:\github\test\linopy\test.py", line 29, in <module>
    m2 = dill.load(f)
         ^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 289, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 444, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  [Previous line repeated 745 more times]
RecursionError: maximum recursion depth exceeded

tburandt avatar Aug 20 '24 12:08 tburandt

@tburandt thanks for raising the issue. that's quite unfortunate. Pickling is not tested atm. how about storing it as netcdf in the meanwhile? should be as fast as pickling

FabianHofmann avatar Aug 21 '24 08:08 FabianHofmann

This is most likely also the reason for deepcopy issues within PyPSA on some networks. I had a look into this a while ago, but this is a better starting point, so I will check again.

lkstrp avatar Aug 21 '24 08:08 lkstrp

I have the vague feeling that the get_item and getattribute overrides could be related to this...

FabianHofmann avatar Aug 21 '24 09:08 FabianHofmann

@FabianHofmann the problem is that multiprocessing and ProcessPoolExecutor (from concurrent.features) for example use pickle (or dill, i am not sure) to handover objects either from one process to another or back to the main process.

For storing the model manually, I can try netcdf. I might have some idea to solve my problem with that at least :)

tburandt avatar Aug 23 '24 12:08 tburandt