attrs
attrs copied to clipboard
Missing recursion detection in attr.asdict
Suppose we use attrs instances to represent graphs:
>>> @attr.s
... class Node:
... val = attr.ib()
... linked_node = attr.ib(default=None)
...
>>> n = Node(1)
>>> attr.asdict(n)
{'val': 1, 'linked_node': None}
>>> n.linked_node = n
>>> n
Node(val=1, linked_node=...)
Although the repr handles it, attr.asdict
does not detect and resolve cycles - should it?
>>> d = attr.asdict(n)
...
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
...
.venv/lib/python3.9/site-packages/attr/_funcs.py in asdict(inst, recurse, filter, dict_factory, retain_collection_types, value_serializer)
60 if recurse is True:
61 if has(v.__class__):
---> 62 rv[a.name] = asdict(
63 v,
64 True,
RecursionError: maximum recursion depth exceeded while calling a Python object
Maybe this the intended behavior (if so, please close this issue), I'm not sure. But I half-expected a self-referencing dict to be returned:
>>> d
{'val': 1, 'linked_node': {...}}
This wasn't clear the first time I looked at your example, but it is indeed infinitely recursive, and as you point out there would need to be some cycle detection here. You are re-using the same instance for linked_node
, if you made a new Node(1)
it would not occur, or used attr.evolve
to copy it. All of which I am sure you realize, just pointing it out for others that come across this issue.
from __future__ import annotations
from typing import Optional
import attr
@attr.s(auto_attribs=True)
class Node:
val: float
linked: Optional[Node] = None
n = Node(1)
print(attr.asdict(n))
n.linked = Node(1)
print(attr.asdict(n))
Output:
{'val': 1, 'linked': None}
{'val': 1, 'linked': {'val': 1, 'linked': None}}
tbh i don't think this is worth ‘fixing’. in the case of repr()
, mostly useful for developing/debugging, not crashing is useful behaviour, and consistent with how repr()
behaves for built-ins like dictionaries:
>>> d = {"a": 123}
>>> d["a"] = d
>>> d
{'a': {...}}
however, throw some json
serialisation into the mix and it will 💥:
>>> import json
>>> json.dumps(d)
Traceback (most recent call last):
Input In [14] in <cell line: 1>
json.dumps(d)
File /usr/lib/python3.10/json/__init__.py:231 in dumps
return _default_encoder.encode(obj)
File /usr/lib/python3.10/json/encoder.py:199 in encode
chunks = self.iterencode(o, _one_shot=True)
File /usr/lib/python3.10/json/encoder.py:257 in iterencode
return _iterencode(o, 0)
ValueError: Circular reference detected
it raises ValueError
(b/c json
does its own detection), while attr.asdict()
raises RecursionError
, but that's an implementation detail.
the key difference here is that, unlike repr()
, there is no single obvious solution here, and hence raising is the right way to go.
tempted to ‘wontfix’. thoughts?
I don't think json encoding is a fair analogy, because it is not possible to represent circular references in a json serialization. It is possible in a Python dict, so why shouldn't attr.asdict
just go ahead and do that?
If raising an exception is the right way to go, it should still be a controlled failure mode rather than consuming resources and eventually blowing the call stack out with recursion error - this way, there's the possibility to give some context about where the circular reference is coming from in the error message.
So, I think whether attrs returns a dict here or raises an exception, the cycle detection would still be beneficial either way.
fair enough, producing a recursive dict would be a reasonable behaviour indeed. but would this always be desired?
(though in practice dict conversion is often used as a quick way to serialise which may fail anyway)
@Tinche what does cattrs
do here? (and why?)
I don't think we do anything special apart from raising a RecursionError. Serializing recursive data is tricky and requires a completely different schema approach (you need to switch to some sort of pointer abstraction) than what's normally expected in JSON/bson etc. And this behavior would have to be opt-in so 99% of other, more usual cases don't take a speed hit.
I guess the question is how much more complex and slower would asdict have to become to support this?
I don’t think it is that tricky in practice, you just keep track of visited ids as you go. The repr is already doing this, as mentioned, so take a look in there for an example.
If there’s interest, a PR can be prepared for profiling
sure, a pr for testing may help.
the current repr()
implementation uses threading.local()
, with a .already_repring
that holds a set of currently ‘repr'ed’ id(...)
values, because:
# Thread-local global to track attrs instances which are already being repr'd.
# This is needed because there is no other (thread-safe) way to pass info
# about the instances that are already being repr'd through the call stack
# in order to ensure we don't perform infinite recursion.
i suspect for .asdict()
this may not be needed, and a ‘memo’ argument that's passed down would work, similar to how copy.copy()
works?
The same problem also exists for __eq__
. However dicts with self references are also not able to compare and fail with RecursionError (on Python 3.10.11)
>>> a = dict()
>>> a[1] = a
>>> b = dict()
>>> b[1] = b
>>> a == b
RecursionError: maximum recursion depth exceeded in comparison
It would be really nice to have a recursion detection for all these recursive methods :-)