dataclass overriding a property in the parent class is broken
Bug report
Bug description:
Consider the following snippet
@dc.dataclass
class A:
a: int
@property
def b(self) -> int:
return 1
@dc.dataclass
class B(A):
b: int = dc.field(default_factory=lambda: 0)
Dataclass apparently doesn't create a class attribute, so class B ends up with property , b of class A, which seems surprising, especially given that there is explicit assignent of field b in class B(A).
So this results in:
>>> B.b
<property object at ...>
>>> B(a=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __init__
AttributeError: property 'b' of 'B' object has no setter
If i replace definition of b like so b: int = dc.field(default=0), or with b: int = 0 then everything works as expected:
>>> B.b
0
>>> B(a=2)
B(a=2, b=0)
I think it would be nice if dataclass would always create a class attribute (especially if there is explicit assignment!) so that overloads work as expected.
CPython versions tested on:
3.11, 3.12
Operating systems tested on:
Linux
Good catch, thank you! I will take a look.
I reproduced the same behaviour without dataclasses:
class A1:
def __init__(self, a):
self.a = a
@property
def b(self) -> int:
return 1
class B1(A1):
def __init__(self, a, b=0) -> None:
self.a = a
self.b = b
print(B1.b)
print(B1(1).b)
produces:
<property object at 0x104fa6f20>
Traceback (most recent call last):
File "/Users/sobolev/Desktop/cpython2/ex.py", line 17, in <module>
print(B1(1).b)
~~^^^
File "/Users/sobolev/Desktop/cpython2/ex.py", line 14, in __init__
self.b = b
^^^^^^
AttributeError: property 'b' of 'B1' object has no setter
So, I think that there's no actual bug. When dataclasses behave as regular python's classes - it is expected.
For your case you can do:
@dc.dataclass
class B(A):
_x: int | None = None
@property
def b(self) -> int:
return self._x or 0
@b.setter
def b(self, value: int) -> None:
self._x = value
Oh, b: int = dc.field(default=0) makes it a bit different.
Sorry, I've missed this part :)
b: int = dc.field(default=0) works, because it explicitly overrides a class-attribute with the default value: https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1027
I think that what can be done for this case: we can check that if a field overrides a property, then we can delete that property from our class namespace.
This way both b: int = dc.field(default=0) and b: int = dc.field(default_factory=lambda: 0) would work the same way. Right now it is indeed confusing.
Or we can leave this as-is, because:
- It will be quite time-consuming to perform this mro traversal check for props
- It is a very niche use-case
- It works kinda similar with regular classes
cc @ericvsmith
@sobolevn Yes, your original example is WAI - since property wasn't overriden. But here we explicity create a new override, so naively i would expect it to override the property definition. E.g. i can override property with another property or another class variable.
Would it make sense for dataclass to just always create a class attribute?
class B(a):
b: int = dc.field(...)
should have B.b == dc.field, because this is what I would expect from such an assignment. THis way you don't need to do any special handling of properties.
https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1025
There is a comment in dataclass saying that if defautl_factory is specified "it should not be set at all in the post-processsed class", but it is not clear "why".
if f.default is MISSING:
# If there's no default, delete the class attribute.
# This happens if we specify field(repr=False), for
# example (that is, we specified a field object, but
# no default value). Also if we're using a default
# factory. The class attribute should not be set at
# all in the post-processed class.
delattr(cls, f.name)
The example above of the workaround, isn't really very useful:
@dc.dataclass
class B(A):
_x: int | None = None
@property
def b(self) -> int:
return self._x or 0
@b.setter
def b(self, value: int) -> None:
self._x = value
This is not a good solution at all, since i won't be able to pass b=3 in the constructor, and so will need to expose user to the implementation details of the class.
Currently we just switched to use functools.cached_property, which doesn't have this problem and works for our use case
I think the argument for not setting the class attribute is that field is supposed to be invisible to the class user.
My inclination would be to leave things as-is.
But we do set attribute in some cases (e.g. when default is set) just not the others (when default_facotry). Which is confusing to say the least.
On Thu, Sep 26, 2024 at 12:01 PM Eric V. Smith @.***> wrote:
I think the argument for not setting the class attribute is that field is supposed to be invisible to the class user.
My inclination would be to leave things as-is.
— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/121354#issuecomment-2377716595, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVIDWCXJNKWOH4D57QU5CTZYRK2DAVCNFSM6AAAAABKKOB2VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4YTMNJZGU . You are receiving this because you authored the thread.Message ID: @.***>
The argument there is that the default values would be useful, but not the default non-values (that is, field). It's entirely possible this was a bad decision, but it's where we are. I'm worried that if we don't delete the class attribute then something else will break.
Hello,
I was about to open an issue because I came across this in quite a different context, while realizing dataclass handles non-data descriptors differently from classes. I'm telling you this because the behaviour you're observing has nothing to do with subclassing I think.
>>> desc = lambda: ... # Random non-data descriptor
>>>
>>> class A:
... x: ... = desc
...
>>> A().x
<bound method <lambda> of <__main__.A object at 0x10bea96d0>>
>>>
>>> @dataclass
... class B:
... x: ... = desc
...
>>> B().x
<function <lambda> at 0x10bcf9f80>
Notice how the return value is "bound" in the vanilla case, and not bound in the dataclass case. This happens because of the parameter look-up mechanism and the fact that dataclass adds statements like self.attr = valto __init__, modifying instance's dict directly:
>>> vars(A())
{}
>>> vars(B())
{'x': <function <lambda> at 0x10bcf9f80>}
The problem now is that for data-descriptors like property(), which implements __set__(), I think the statement self.attr = val will call getattr(type(self), attr).__set__(attr, val), which gives this error:
>>> @dataclass
... class C:
... x: ... = property(lambda _: ...)
...
>>> C()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in __init__
AttributeError: property 'x' of 'C' object has no setter
To keep consistent behaviour with the look-up that shortcuts __get__ for non-data descriptor, maybe it should shortcut every descriptor both for get and set (and maybe del too?). It would not be an issue for properties defined with def in the classes, since these should never get annotated.
I think this may require a bit of rethinking for dataclasses.py though...