attrs
attrs copied to clipboard
Mixed inheritance between slotted and non slotted classes leads to issues with parallelisation.
Mixed inheritance between slotted and non-slotted classes seems to lead to some issues when object are copied for parallelization.
The following code example does not work and raises: AttributeError: 'Child' object has no attribute 'z'
import attrs
import multiprocessing as mp
@attrs.define(slots=False, kw_only=True)
class Parent:
name: str
@attrs.define
class Child(Parent):
x: float
def __attrs_post_init__(self):
self.z = self.x + 2
def arg_plus_z(self, arg):
return self.z + arg
test = Child(3, name = "as")
with mp.Pool(processes=4) as pool:
result = pool.map(test.arg_plus_z, range(100))
whilst adding slots=False to the child class or registering z using z: float = attrs.field(default = None, init=False) does work:
import attrs
import multiprocessing as mp
@attrs.define(slots=False, kw_only=True)
class Parent:
name: str
@attrs.define(slots=False)
class Child(Parent):
x: float
def __attrs_post_init__(self):
self.z = self.x + 2
def arg_plus_z(self, arg):
return self.z + arg
test = Child(3, name = "as")
with mp.Pool(processes=4) as pool:
result = pool.map(test.arg_plus_z, range(100))
It seems like the object copy of test passed to each one of the processes does not copy the __dict__ (which includes z and name) but only the __slots__.
I know that according to the attrs documentation mixed inheritance between slotted and non slotted classes is considered a bad practice. I stumbled across this by accident and it might also be more of a missing feature than a bug.
Meta:
- attrs-version: '21.4.0'
- python-version: '3.9.13'
My wild guess here is that this is a pickling issue, because we don't get involved in pickling of dict classes and we only care about our own attributes in slotted classes.
I think this is expected behavior. If you do declare z as an attr, it will get copied (as you noticed). Pickling implementation (__setstate__ that is) only looks at those. Additionally, __init__ is not run on unpicking - this is not specific to how attrs override __setstate__ but is generally how pickle works. IOW if you need to rely on pickle (say, for multiprocessing) you should avoid logic in your init and related hooks.
Another option is, of course, to provide your own implementation of set/getstate, which attrs will honor, but is arguably clunky/inelegant.
Yeah to be clear: the implementation is 100% correct – the q is how to help people to not running into this.
Yeah to be clear: the implementation is 100% correct – the q is how to help people to not running into this.
This article by @tdsmith clued me in to this issue: https://blog.tim-smith.us/2024/01/pickle-slots/
and I think I disagree here?
Here's my reasoning; consider this example:
from attrs import define
@define
class NoInheritance:
x: int
y: int
class StatefulMixinAntipattern:
def __init__(self) -> None:
self.counter = 0
def bump(self) -> None:
self.counter += 1
@define()
class InheritsMixin(StatefulMixinAntipattern):
parameter: str
def __attrs_post_init__(self) -> None:
super().__init__()
print("__dict__" in dir(NoInheritance))
print("__dict__" in dir(InheritsMixin))
Which produces:
False
True
This shows that attrs can easily know, at the time that a class is defined, that it has inherited a __dict__ attribute and the associated mess. Moreover, if I add getstate_setstate=False to the @define call in this example, then this works as expected:
from pickle import dumps, loads
def roundtrip(x):
return loads(dumps(x))
im = InheritsMixin("hi")
im.bump()
rt = roundtrip(im)
print(rt.parameter, rt.counter)
So it would seem to me that the correct thing to do here is to simply set this flag by default when non-slotted inheritance is involved.
For completeness, you'll encounter the same problem if StatefulMixinAntipattern is slotted, so checking for __dict__ may be only a partial solution.
For completeness, you'll encounter the same problem if
StatefulMixinAntipatternis slotted, so checking for__dict__may be only a partial solution.
I mainly mentioned the non-slotted version since that more or less requires the "soup of state" fallback. When the base is slotted, there can still be an explicit enumeration.
Although this does prompt the question: is there any value to attrs providing its own __getstate__ at all? I don't see much in the docs about the semantics of it, if it differs from default pickling in some advantageous way
Is there any precedence at all for attrs acting on third-party data? I don't think so?
As for the why:
Is there any precedence at all for attrs acting on third-party data? I don't think so?
I don't see much difference between this "third party" data that you would get from a non-attrs library and second-party data supplied by the user. The annotations being processed do not come from attrs, for example. And both the class and its bases may be defined by the user (as indeed they are, in Tim's case). Attrs does pass along the base class so it is "acting" on this data anyway.
I think the more interesting conversation is whether we need to care about Pickle protocol versions 0 and 1 in the year of the Cthulhu 2023?
I think the more interesting conversation is whether we need to care about Pickle protocol versions 0 and 1 in the year of the Cthulhu 2023?
Protocol version 2 was introdced in Python 2.3: https://realpython.com/python-pickle-module/
Do we have a lot of Python 2.1 and 2.2 users?
(Python 2.3 was released on July 29, 2003.)
I believe in 2.7 days it was for some reason still common, otherwise we wouldn't have added it. Maybe someone should do some spelunking what the context was back then. Also paging @djipko if he had any insights.