cpython icon indicating copy to clipboard operation
cpython copied to clipboard

dataclass overriding a property in the parent class is broken

Open marksandler2 opened this issue 1 year ago • 1 comments

Bug report

Bug description:

Consider the following snippet

@dc.dataclass
class A:
  a: int
  @property
  def b(self) -> int:
    return 1

@dc.dataclass
class B(A):  
  b: int = dc.field(default_factory=lambda: 0)

Dataclass apparently doesn't create a class attribute, so class B ends up with property , b of class A, which seems surprising, especially given that there is explicit assignent of field b in class B(A).

So this results in:

>>> B.b
<property object at ...>
>>> B(a=2)
Traceback (most recent call last):                                                                                                                            
  File "<stdin>", line 1, in <module>                                                                                                                         
  File "<string>", line 4, in __init__                                                                                                                        
AttributeError: property 'b' of 'B' object has no setter      

If i replace definition of b like so b: int = dc.field(default=0), or with b: int = 0 then everything works as expected:

>>> B.b
0
>>> B(a=2)
B(a=2, b=0)

I think it would be nice if dataclass would always create a class attribute (especially if there is explicit assignment!) so that overloads work as expected.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

marksandler2 avatar Jul 03 '24 23:07 marksandler2

Good catch, thank you! I will take a look.

sobolevn avatar Jul 04 '24 06:07 sobolevn

I reproduced the same behaviour without dataclasses:

class A1:
  def __init__(self, a):
    self.a = a

  @property
  def b(self) -> int:
    return 1

class B1(A1):
  def __init__(self, a, b=0) -> None:
    self.a = a
    self.b = b

print(B1.b)
print(B1(1).b)

produces:

<property object at 0x104fa6f20>
Traceback (most recent call last):
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 17, in <module>
    print(B1(1).b)
          ~~^^^
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 14, in __init__
    self.b = b
    ^^^^^^
AttributeError: property 'b' of 'B1' object has no setter

So, I think that there's no actual bug. When dataclasses behave as regular python's classes - it is expected.

For your case you can do:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value

sobolevn avatar Jul 05 '24 07:07 sobolevn

Oh, b: int = dc.field(default=0) makes it a bit different. Sorry, I've missed this part :)

sobolevn avatar Jul 05 '24 07:07 sobolevn

b: int = dc.field(default=0) works, because it explicitly overrides a class-attribute with the default value: https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1027

I think that what can be done for this case: we can check that if a field overrides a property, then we can delete that property from our class namespace.

This way both b: int = dc.field(default=0) and b: int = dc.field(default_factory=lambda: 0) would work the same way. Right now it is indeed confusing.

sobolevn avatar Jul 05 '24 07:07 sobolevn

Or we can leave this as-is, because:

  • It will be quite time-consuming to perform this mro traversal check for props
  • It is a very niche use-case
  • It works kinda similar with regular classes

cc @ericvsmith

sobolevn avatar Jul 05 '24 07:07 sobolevn

@sobolevn Yes, your original example is WAI - since property wasn't overriden. But here we explicity create a new override, so naively i would expect it to override the property definition. E.g. i can override property with another property or another class variable.

Would it make sense for dataclass to just always create a class attribute?

class B(a):
   b: int = dc.field(...)

should have B.b == dc.field, because this is what I would expect from such an assignment. THis way you don't need to do any special handling of properties.

https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1025

There is a comment in dataclass saying that if defautl_factory is specified "it should not be set at all in the post-processsed class", but it is not clear "why".

  if f.default is MISSING:
                # If there's no default, delete the class attribute.
                # This happens if we specify field(repr=False), for
                # example (that is, we specified a field object, but
                # no default value).  Also if we're using a default
                # factory.  The class attribute should not be set at
                # all in the post-processed class.
                delattr(cls, f.name)

marksandler2 avatar Jul 07 '24 00:07 marksandler2

The example above of the workaround, isn't really very useful:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value

This is not a good solution at all, since i won't be able to pass b=3 in the constructor, and so will need to expose user to the implementation details of the class.

Currently we just switched to use functools.cached_property, which doesn't have this problem and works for our use case

marksandler2 avatar Jul 07 '24 00:07 marksandler2

I think the argument for not setting the class attribute is that field is supposed to be invisible to the class user.

My inclination would be to leave things as-is.

ericvsmith avatar Sep 26 '24 19:09 ericvsmith

But we do set attribute in some cases (e.g. when default is set) just not the others (when default_facotry). Which is confusing to say the least.

On Thu, Sep 26, 2024 at 12:01 PM Eric V. Smith @.***> wrote:

I think the argument for not setting the class attribute is that field is supposed to be invisible to the class user.

My inclination would be to leave things as-is.

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/121354#issuecomment-2377716595, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVIDWCXJNKWOH4D57QU5CTZYRK2DAVCNFSM6AAAAABKKOB2VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXG4YTMNJZGU . You are receiving this because you authored the thread.Message ID: @.***>

marksandler2 avatar Sep 26 '24 19:09 marksandler2

The argument there is that the default values would be useful, but not the default non-values (that is, field). It's entirely possible this was a bad decision, but it's where we are. I'm worried that if we don't delete the class attribute then something else will break.

ericvsmith avatar Sep 26 '24 20:09 ericvsmith

Hello,

I was about to open an issue because I came across this in quite a different context, while realizing dataclass handles non-data descriptors differently from classes. I'm telling you this because the behaviour you're observing has nothing to do with subclassing I think.

>>> desc = lambda: ...  # Random non-data descriptor
>>>
>>> class A:
...  x: ... = desc
... 
>>> A().x
<bound method <lambda> of <__main__.A object at 0x10bea96d0>>
>>> 
>>> @dataclass
... class B:
...  x: ... =  desc
... 
>>> B().x
<function <lambda> at 0x10bcf9f80>

Notice how the return value is "bound" in the vanilla case, and not bound in the dataclass case. This happens because of the parameter look-up mechanism and the fact that dataclass adds statements like self.attr = valto __init__, modifying instance's dict directly:

>>> vars(A())
{}
>>> vars(B())
{'x': <function <lambda> at 0x10bcf9f80>}

The problem now is that for data-descriptors like property(), which implements __set__(), I think the statement self.attr = val will call getattr(type(self), attr).__set__(attr, val), which gives this error:

>>> @dataclass
... class C:
...  x: ... = property(lambda _: ...)
... 
>>> C()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __init__
AttributeError: property 'x' of 'C' object has no setter

To keep consistent behaviour with the look-up that shortcuts __get__ for non-data descriptor, maybe it should shortcut every descriptor both for get and set (and maybe del too?). It would not be an issue for properties defined with def in the classes, since these should never get annotated.

I think this may require a bit of rethinking for dataclasses.py though...

eliegoudout avatar Dec 22 '24 00:12 eliegoudout