AttributeError during unpickling DataContainer: '_allow_compute' attribute missing
Describe the bug I am encountering an AttributeError when unpickling a DataContainer object from the xeofs package. The error message is:
AttributeError: 'DataContainer' object has no attribute '_allow_compute'
This occurs when I attempt to unpickle a dictionary containing a DataContainer instance using dill. It seems that during the unpickling process, the __init__ method is not called, so the _allow_compute attribute is not initialized. As a result, when __setitem__ is called during unpickling, it tries to access _allow_compute, which doesn't exist.
Reproducible Minimal Working Example
import dill
from xeofs.data_container.data_container import DataContainer
# Create a DataContainer object and add an item
data_container = DataContainer()
data_container['key'] = 'value'
# Serialize the DataContainer object
with open('data_container.pkl', 'wb') as f:
dill.dump(data_container, f)
# Deserialize the DataContainer object
with open('data_container.pkl', 'rb') as f:
loaded_data_container = dill.load(f) # This line raises the AttributeError
Traceback
File ".../bwcore/forecasting/forecaster.py", line 76, in load_forecast_dict
forecast_dict = dill.load(f)
File ".../site-packages/dill/_dill.py", line 297, in load
return Unpickler(file, ignore=ignore, **kwds).load()
File ".../site-packages/dill/_dill.py", line 452, in load
obj = StockUnpickler.load(self)
File ".../site-packages/xeofs/data_container/data_container.py", line 24, in __setitem__
self._allow_compute[__key] = self._allow_compute.get(__key, True)
AttributeError: 'DataContainer' object has no attribute '_allow_compute'
Expected behavior
I expect the DataContainer object to be successfully unpickled without errors, with all its attributes properly initialized.
Desktop (please complete the following information):
- OS: Ubuntu 24.04 LTS aarch64
- Python [3.10]
xeofsversion [3.0.2]
Additional context
I believe the issue arises because __init__ is not called during unpickling, so any attributes initialized there (like _allow_compute) are missing. I came across this explanation which mentions that __init__ is not called during unpickling.
To address this, I did a monkey patch to the __setitem__ method in the DataContainer class to check for the existence of _allow_compute and initialize it if necessary:
class DataContainer(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._allow_compute = dict({k: True for k in self.keys()})
def __setitem__(self, __key: str, __value: DataArray) -> None:
super().__setitem__(__key, __value)
# Ensure _allow_compute exists
if not hasattr(self, '_allow_compute'):
self._allow_compute = {}
self._allow_compute[__key] = self._allow_compute.get(__key, True)
With this change, the unpickling process completes without errors.
I'm not entirely sure if this issue is due to something on my side, since my pipeline uses Python 3.10, and perhaps the pickling behavior differs in this version. If this is indeed a bug, perhaps implementing __getstate__ and __setstate__ methods in the DataContainer class could ensure that _allow_compute is properly handled during pickling and unpickling.
Could you please look into this issue? Any guidance or confirmation on whether this is a known issue or if there's a better way to handle it would be appreciated. If you need any more information let me know :)
I can confirm that this is not pickle-able. Seems like a reasonable feature request and solution. Inheriting from UserDict also seems to work but I don't really know the nuances.
Feel free to make a PR with tests to prevent regression. Besides the DataContainer, have you checked that we can pickle all the other model components?
Looking at the DataContainer again, I wonder if it makes sense to rework this to be a DataTree, since we're using that as a common structure elsewhere now. That would be a much more involved PR though. @nicrie do you have any particular reasons why you chose this subclassed dictionary structure?
@nicrie do you have any particular reasons why you chose this subclassed dictionary structure?
Nope I just wanted a simple, consistent way to handle data across all models. Totally agree with @slevang that reworking DataContainer to inherit from DataTree would be the ideal approach.
No strong feelings on the other parts -- I'd be fine with a temporary fix like @n0rdp0l suggested, which could later be replaced with a larger update using DataTrees.