3.14 regression: slot dataclasses classes leak original class
Bug report
Bug description:
While trying to test cattrs on 3.14 I ran into this issue. Here's a simple reproducer that passes on 3.13, but doesn't on 3.14.
import gc
from dataclasses import dataclass
@dataclass(slots=True)
class B:
b: int
gc.collect()
assert [
o for o in gc.get_objects() if hasattr(o, "__name__") and o.__name__ == "B"
] == [B]
Originally I ran into this issue with slotted attrs classes but the core problem is the same. Since slotness cannot be added to a class, dataclasses and attrs classes create a class copy with slots instead. The old class used to hang around until a GC collection happened (guess there are some reference cycles, but I never investigated thoroughly).
On 3.14, a full GC collection does not clean up the original class, causing a leak. Apart from leaking, in case of subclassing the original class will remain in the list of the parent .__subclasses__(), causing an issue.
CPython versions tested on:
3.14
Operating systems tested on:
macOS
Linked PRs
- gh-135230
I think this is because the annotate function for the class holds on to the original class namespace, and that namespace dict contains the descriptors B.__dict__ and B.__weakref__ that refer back to the original class. Not too sure how to fix that yet.
This seems hard to avoid, maybe we should just live with the change? The reason is that the annotation scope needs to hold a reference to the actual class dict (for reasons explained in https://jellezijlstra.github.io/pep695#class-scopes) and the class dict contains two descriptors, for __dict__ and __weakref__, that contain references back to the class object.
In Python 3.12 and 3.13 this behavior is already observable if you use a type alias or TypeVar bound in the dataclass, with one of these class definitions:
@dataclass(slots=True)
class B:
def f[T: int](self, x: T) -> T: ...
b: int
@dataclass(slots=True)
class B:
type T = int
b: int
Both of these fail the assertion in the post above.
I put up a change in #135230 that fixes this issue, but messing with the descriptors feels like it's probably going to break something else.
@JelleZijlstra I think you're right that the leak itself is not a big deal, and we can just live with it.
What is a bigger deal is this:
from dataclasses import dataclass
class A:
pass
@dataclass(slots=True)
class B(A):
pass
print(A.__subclasses__()) # [<class '__main__.B'>, <class '__main__.B'>]
This will break a piece of cattrs and anyone using __subclasses__() with slotted dataclasses and attrs classes. Maybe we can work around it in the __subclassess__() implementation somehow?
release blocker status just to help force consideration. It may wind up deferred.
@JelleZijlstra I think you're right that the leak itself is not a big deal, and we can just live with it.
What is a bigger deal is this:
from dataclasses import dataclass class A: pass @dataclass(slots=True) class B(A): pass print(A.__subclasses__()) # [<class '__main__.B'>, <class '__main__.B'>]This will break a piece of cattrs and anyone using
__subclasses__()with slotted dataclasses and attrs classes. Maybe we can work around it in the__subclassess__()implementation somehow?
This actually is not true if you run gc.collect() first.
>>> from dataclasses import dataclass
>>> import gc
>>> class A: pass
...
>>> @dataclass(slots=True)
... class B(A):
... attr: int
...
>>> print(A.__subclasses__())
[<class '__main__.B'>, <class '__main__.B'>]
>>> gc.collect()
222
>>> print(A.__subclasses__())
[<class '__main__.B'>]
This confused me greatly when I was trying to write test cases just now, but what's going on seems to be that the __dict__ and __weakref__ slots are added only for direct subclasses of object.
In any case, I put up another possible fix in #136893. It has some potential compatibility implications, but it's something you could also do in e.g. attrs if you choose.
So I do actually run gc.collect() first, but this trick seems to have stopped working on 3.14. I'm not on my computer at the moment, but this is what causes the cattrs test suite to fail on 3.14 and what initially motivated this issue.
To be clear, gc.collect() doesn't help if the slotted dataclass is a direct child of object (which is probably the more common case in practice). It helps if the slotted class is a child of another Python class.
In any case, just to reiterate: I put up a proposed fix for dataclasses that should also be applicable to attrs. It's rather hacky; let me know if you think it's an acceptable solution.
release blocker status just to help force consideration. It may wind up deferred.
https://github.com/python/cpython/pull/136960 has been merged with a fix. Let's move this to deferred to see how it goes in 3.14 RC1.
Here's another possible direction to solve this: #136966
We decided to revert the #136893 fix because modifying the class dict directly is playing with fire.
I think an approach like Petr's in #136966 is promising but it feels a bit late to do that for 3.14, because we're making changes deep in core descriptor logic. We should try that for 3.15 though.
For cases where you use introspection, there's another workaround possible, which relies on the fact that @dataclass in fact dataclassifies the non-slotted class before it creates the slots wrapper: you can filter out any classes that (a) don't have __slots__ and (b) have cls.__dataclass_params__.slots = True. This should fix use cases like https://catt.rs/en/stable/strategies.html#include-subclasses-strategy .
We could potentially make this a bit easier by setting an attribute like .__dataclass_replaced_with_wrapper_class__ on the original class.
We have the same problem without dataclasses.
>>> class A: pass
...
>>> class B(A): pass
...
>>> A.__subclasses__()
[<class '__main__.B'>]
>>> del B
>>> A.__subclasses__()
[<class '__main__.B'>]
>>> import gc; gc.collect(); gc.collect()
0
>>> A.__subclasses__()
[<class '__main__.B'>]
We have the same problem without dataclasses.
@serhiy-storchaka It doesn't repro if I run this script as a file (python.bat .\test.py). Maybe it related to REPL?
[2025-07-22 21:30:28] [D:\Sources\_pythonish\cpython\main] [main ≡ +29 ~0 -0 !] [ 3:32.666]
➜ .\python.bat -i .\test.py
Running Debug|x64 interpreter...
[<class '__main__.B'>]
[<class '__main__.B'>]
[]
>>> exit()
[2025-07-22 21:30:36] [D:\Sources\_pythonish\cpython\main] [main ≡ +29 ~0 -0 !] [ 2.23]
➜ .\python.bat .\test.py
Running Debug|x64 interpreter...
[<class '__main__.B'>]
[<class '__main__.B'>]
[]
@JelleZijlstra explained to me that this is because the REPL holds a reference to the result of A.__subclasses__().
Yeah. After this note it looks logical.
To be sure that I understand this correct:
A.__subclasses__() returns list of types with strong refs. And REPL holds a strong ref to this list. So list holds an extra ref to B even if we delete it from REPL global.
If we delete B and call gc.collect() before first call of __subclasses__ then it returns [].
Yes. If you're playing with this in the REPL you can prevent the problem by using print(A.__subclasses__()) instead of A.__subclasses__().
Note that __subclasses__ may always include classes that are currently unreachable, because classes are always part of a reference cycle (even before 3.14) and therefore can only get collected when GC runs, which happens at an indeterminate time.
may always include classes that are currently unreachable, because classes are always part of a reference cycle (even before 3.14)
I'm not sure I get you correctly. IIUC call of __subclasses__ return list that holds strong refs (from B to A via base, but not from A to B) and there are no reference cycles. tp_subclasses contains weakrefs so no reference cycles too. Can you correct me where I'm wrong?
Yes, the subclass list is stored as a list of weakrefs (which are materialized into strong refs if you actually call __subclasses__). However, every Python class is necessarily part of a reference cycle unrelated to the subclass list; I explained why in https://github.com/python-attrs/attrs/issues/1047#issuecomment-3104006808
Ough! Thanks, I completely forgot about MRO!
Another attempt: #137047. This is similar to the reverted version, but no longer relies on hacks to edit the __dict__ directly. Instead, we expose a private function in sys that clears out the problematic descriptors in C code.
If the problem is the annotate function that holds reference to the original class namespace, can we create a new annotate function that holds reference to the new class namespace? If this impossible in Python, would not a private function in sys that creates the annotate function be more safe?
There are multiple annotate functions potentially involved (the one for the class annotations, those for any methods with annotations, and potentially more for TypeVar bounds in generic methods). We'd have to walk through the class dictionary to find all of them, which might not even be possible if the methods are wrapped by decorators. This will be difficult to get right and more complicated than the approach in #137047.
Just out of curiosity, since we're now delving into C-land, would it be possible to have a C function that would rework a class in-place to make it a slotted class? I feel like it would solve a number of issues at the root.
@Tinche yes that is probably a better long-term solution. My understanding is that it's possible in C to change "slottedness" after the fact, but it's safe only if no instances of the class have been created yet, and proving that condition is not trivial. Maybe there's a way but it's not something we can do for 3.14 at this point.
It looks like if the annotation is a ForwardRef, the original example still fails.
With int it now runs, but this still raises the AssertionError:
import gc
from dataclasses import dataclass
@dataclass(slots=True)
class B:
b: undefined
gc.collect()
assert [
o for o in gc.get_objects() if hasattr(o, "__name__") and o.__name__ == "B"
] == [B]
This appears to be because ForwardRefs in dataclass fields still refer to the original class.
Intentionally keeping the old class for demonstration:
from dataclasses import dataclass, fields
class OLD_B:
b: undefined
NEW_B = dataclass(OLD_B, slots=True)
for f in fields(NEW_B):
owner = f.type.__owner__
print(f"{owner is NEW_B = }") # False
print(f"{owner is OLD_B = }") # True
That one is likely fixable in dataclasses by only checking the annotations after we reslotify the class.
Yes, I have a potential solution for that but I'm trying to separate it from the solution for #137530 as the current __init__ method and a generated __annotate__ function can both maintain references to the original class.
In 3.14 we have #137047, a workaround; in 3.15 we have #136966, a workaround that's also an improvement (though the improvement is not directly related to the issue).
Shall we leave this open for something like allowing decorators to fiddle with __build_class__ arguments?
Shall we leave this open for something like allowing decorators to fiddle with
__build_class__arguments?
I think that's a good idea but it strays a bit too far from the original issue here, so probably better in a new issue (plus, that change is going to need a PEP).
However, let's keep this open for other cases where annotations keep the unslotted class alive, like the case @DavidCEllis came up with involving ForwardRefs.