attrs
attrs copied to clipboard
Reference cycle for slot classes
Even the birds on the branches nowadays know we do a class switcheroo when defining slot classes.
The initial class (the one we throw away) is part of a reference cycle with something else, so it doesn't get GC'd right away. A tiny example:
from attrs import define
@define
class A:
a: int
@define
class B(A):
b: int
# collect()
print(A.__subclasses__()) # [<class '__main__.B'>, <class '__main__.B'>]
If gc.collect() is called right afterwards, the old class gets cleaned up, so it's almost certainly a reference cycle.
So, a good issue for someone getting started with attrs: find out what the reference cycle is and break it so this doesn't happen.
Alternatively, we could call gc.collect() ourselves. We could also check for the old class in __subclasses__() and emit a warning if we find it there; it probably means someone is using a bare super somewhere we didn't rewrite it.
Any chance the gc.collect could be already called in attrs after the slotted class creation?
That seems a bit like a lot of extra overhead for programs with lots of classes?
right...
Unfortunately the trick with collect doesn't work if you want to do something with subclasses within __attrs_init_subclass__ :
from gc import collect
from attrs import define
@define
class A:
a: int
@classmethod
def __attrs_init_subclass__(cls) -> None:
collect()
scl = A.__subclasses__()
print(scl) # [<class '__main__.B'>, <class '__main__.B'>]
print([f.name for f in fields(scl[0])]) # ['a']
print([f.name for f in fields(scl[1])]) # ['a', 'b']
@define
class B(A):
b: int
Both B classes are actually attrs classes, but the first one doesn't have the field b yet. Very strange. Unfortunately that prevents me from using e.g. cattrs.strategies.include_subclasses on A in __attrs_init_subclass__, as in some cases the algorithm in include_subclasses gets tripped up on the duplicate classes (which will at some point cease to exist).
Chiming in here because I've spent some time trying to fix the issue but I've not had quick enough success to continue the work. The vestigial class that's being kept alive is the cls passed through the attr class wrapper.. I verified that with id(cls). I turned to gc.get_referrers() at first, which led me to work on _ClassBuilder. My first attempt with that class was to remove its self._cls slot entirely, which meant packing all the wrapped class metadata (like __module__, __name__, __qualname__, and more) into a separate container. The idea of this exercise was to see if I could find a bound method being held somewhere. This approach worked fine and the tests passed, but it didn't correct the cycle. This was a pretty unfocused attempt at a fix, and I'm sure one of the actual attrs contributors would go about this with more direction and purpose.
Here's a minimal example with gc output in case it helps any onlookers.
>>> import attrs, gc, pprint
>>> @attrs.define
... class Base: ...
... @attrs.define
... class Sub(Base): ...
...
>>>
>>> subs = Base.__subclasses__()
>>> subs
[<class '__main__.Sub'>, <class '__main__.Sub'>]
>>>
>>> # Note that this is the vestigial class that we want to be collected
>>> pprint.pprint(gc.get_referrers(subs[0]))
[(<class '__main__.Sub'>, <class '__main__.Base'>, <class 'object'>),
<attribute '__dict__' of 'Sub' objects>,
<_ClassBuilder(cls=Sub)>,
[<class '__main__.Sub'>, <class '__main__.Sub'>]]
>>>
>>> # This is the final product of the `@attr` wrapper
>>> pprint.pprint(gc.get_referrers(subs[1]))
[(<class '__main__.Sub'>, <class '__main__.Base'>, <class 'object'>),
[<class '__main__.Sub'>, <class '__main__.Sub'>],
{'Base': <class '__main__.Base'>,
'Sub': <class '__main__.Sub'>,
'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__cached__': '/opt/homebrew/Cellar/[email protected]/3.13.5/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/__pycache__/__main__.cpython-313.pyc',
'__doc__': None,
'__file__': '/opt/homebrew/Cellar/[email protected]/3.13.5/Frameworks/Python.framework/Versions/3.13/lib/python3.13/_pyrepl/__main__.py',
'__loader__': None,
'__name__': '__main__',
'__package__': '_pyrepl',
'__spec__': None,
'attrs': <module 'attrs' from '/Users/tad/prog/ventral-sacs/env/lib/python3.13/site-packages/attrs/__init__.py'>,
'gc': <module 'gc' (built-in)>,
'pprint': <module 'pprint' from '/opt/homebrew/Cellar/[email protected]/3.13.5/Frameworks/Python.framework/Versions/3.13/lib/python3.13/pprint.py'>,
'subs': [<class '__main__.Sub'>, <class '__main__.Sub'>]}]
>>>
>>> # This helps us tell the difference between the vestigial `Sub` and the one we want
>>> list(map(id, subs))
[5032132800, 5060444960]
>>> id(Sub)
5060444960
My next plan would have been for something equally unfocused and broad, and that would be to rewrite _ClassBuilder in a plainer, more imperative way with closures to manage state. Ultimately, though, I think that kind of deep dive is probably not welcome and it would be wiser to let an attrs expert fix this issue. In my opinion, though, this issue is a pretty serious one. Breaking the user's expectations around __subclasses__() is not some small thing, in my humble opinion, and I think it also suggests that the underlying code should be simplified. Perhaps attrs is at the stage where some older features and backwards compatibility can be slowly deprecated, and maybe the complex internals like _ClassBuilder can be improved after shedding some of that baggage. Just my two cents as an outsider to the project.
Thanks for the great library.
Oh, and I tried running gc.collect() in the code. As somebody suggested up thread, it's very slow. The test suite took 56s with forced collection, up from 4.2s on my machine. But of course it would be a mistake to touch the GC in a project like this anyway.
On 3.14 the GC doesn't help any more anyway.
Neither does it for data classes as I understand?
Yes, data classes suffer the same predicament.
what happens if we explicitly del the old class after all its data has been transferred to the new slotted class? Shouldn't that explicitly trigger any ref cleanup?
Here's a kind of contrived example to illustrate why that won't do anything. I tried it just for fun with attrs and, indeed, it doesn't work.
>>> class ClassHolder:
... def __init__(self, cls: type):
... self.cls = cls
...
>>>
>>> class Example: ...
...
>>> cls_holder = ClassHolder(Example)
>>> del Example
>>> cls_holder.cls
<class '__main__.Example'>
>>> Example
Traceback (most recent call last):
File "<python-input-35>", line 1, in <module>
Example
NameError: name 'Example' is not defined
So, a good issue for someone getting started with attrs: find out what the reference cycle is and break it so this doesn't happen.
Every class is part of a cycle, because classes contain a reference to their MRO tuple, which includes the class itself. I don't think there's a way to get around this in pure Python.
>>> class A: pass
...
>>> gc.get_referents(A)
[{'__module__': '__main__', '__firstlineno__': 1, '__static_attributes__': (), '__dict__': <attribute '__dict__' of 'A' objects>, '__weakref__': <attribute '__weakref__' of 'A' objects>, '__doc__': None}, (<class '__main__.A'>, <class 'object'>), (<class 'object'>,), <class 'object'>]
Classes that inherit directly from object, but not classes that inherit from other Python classes (not sure that's a fully precise description), have an additional reference cycle involving the __dict__ and __weakref__ entries in the class dict; those are descriptors that contain a reference back to the class.
>>> A.__dict__["__dict__"].__objclass__
<class '__main__.A'>
This cycle also cannot be safely broken in pure Python code because there's no way to get rid of the __dict__ descriptor.
In Python 3.14, there's more cycles (python/cpython#135228), because the code objects for annotations now contain a reference back to the class dict, which (through the __dict__/__weakref__ descriptors) contains a reference back to the class.
Python 3.14.0rc2 adds a workaround for __subclasses__ of slotted dataclasses.
(If you end up using sys._clear_type_descriptors too, please put in e.g. an except AttributeError, so it's easier for CPython to remove it if we find a better fix. And as with any private attribute, if you must use it, test early & stay in touch ;)