attrs
attrs copied to clipboard
Missing recursion detection in attr.asdict
Suppose we use attrs instances to represent graphs:
>>> @attr.s
... class Node:
... val = attr.ib()
... linked_node = attr.ib(default=None)
...
>>> n = Node(1)
>>> attr.asdict(n)
{'val': 1, 'linked_node': None}
>>> n.linked_node = n
>>> n
Node(val=1, linked_node=...)
Although the repr handles it, attr.asdict does not detect and resolve cycles - should it?
>>> d = attr.asdict(n)
...
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
...
.venv/lib/python3.9/site-packages/attr/_funcs.py in asdict(inst, recurse, filter, dict_factory, retain_collection_types, value_serializer)
60 if recurse is True:
61 if has(v.__class__):
---> 62 rv[a.name] = asdict(
63 v,
64 True,
RecursionError: maximum recursion depth exceeded while calling a Python object
Maybe this the intended behavior (if so, please close this issue), I'm not sure. But I half-expected a self-referencing dict to be returned:
>>> d
{'val': 1, 'linked_node': {...}}
This wasn't clear the first time I looked at your example, but it is indeed infinitely recursive, and as you point out there would need to be some cycle detection here. You are re-using the same instance for linked_node, if you made a new Node(1) it would not occur, or used attr.evolve to copy it. All of which I am sure you realize, just pointing it out for others that come across this issue.
from __future__ import annotations
from typing import Optional
import attr
@attr.s(auto_attribs=True)
class Node:
val: float
linked: Optional[Node] = None
n = Node(1)
print(attr.asdict(n))
n.linked = Node(1)
print(attr.asdict(n))
Output:
{'val': 1, 'linked': None}
{'val': 1, 'linked': {'val': 1, 'linked': None}}
tbh i don't think this is worth ‘fixing’. in the case of repr(), mostly useful for developing/debugging, not crashing is useful behaviour, and consistent with how repr() behaves for built-ins like dictionaries:
>>> d = {"a": 123}
>>> d["a"] = d
>>> d
{'a': {...}}
however, throw some json serialisation into the mix and it will 💥:
>>> import json
>>> json.dumps(d)
Traceback (most recent call last):
Input In [14] in <cell line: 1>
json.dumps(d)
File /usr/lib/python3.10/json/__init__.py:231 in dumps
return _default_encoder.encode(obj)
File /usr/lib/python3.10/json/encoder.py:199 in encode
chunks = self.iterencode(o, _one_shot=True)
File /usr/lib/python3.10/json/encoder.py:257 in iterencode
return _iterencode(o, 0)
ValueError: Circular reference detected
it raises ValueError (b/c json does its own detection), while attr.asdict() raises RecursionError, but that's an implementation detail.
the key difference here is that, unlike repr(), there is no single obvious solution here, and hence raising is the right way to go.
tempted to ‘wontfix’. thoughts?
I don't think json encoding is a fair analogy, because it is not possible to represent circular references in a json serialization. It is possible in a Python dict, so why shouldn't attr.asdict just go ahead and do that?
If raising an exception is the right way to go, it should still be a controlled failure mode rather than consuming resources and eventually blowing the call stack out with recursion error - this way, there's the possibility to give some context about where the circular reference is coming from in the error message.
So, I think whether attrs returns a dict here or raises an exception, the cycle detection would still be beneficial either way.
fair enough, producing a recursive dict would be a reasonable behaviour indeed. but would this always be desired?
(though in practice dict conversion is often used as a quick way to serialise which may fail anyway)
@Tinche what does cattrs do here? (and why?)
I don't think we do anything special apart from raising a RecursionError. Serializing recursive data is tricky and requires a completely different schema approach (you need to switch to some sort of pointer abstraction) than what's normally expected in JSON/bson etc. And this behavior would have to be opt-in so 99% of other, more usual cases don't take a speed hit.
I guess the question is how much more complex and slower would asdict have to become to support this?
I don’t think it is that tricky in practice, you just keep track of visited ids as you go. The repr is already doing this, as mentioned, so take a look in there for an example.
If there’s interest, a PR can be prepared for profiling
sure, a pr for testing may help.
the current repr() implementation uses threading.local(), with a .already_repring that holds a set of currently ‘repr'ed’ id(...) values, because:
# Thread-local global to track attrs instances which are already being repr'd.
# This is needed because there is no other (thread-safe) way to pass info
# about the instances that are already being repr'd through the call stack
# in order to ensure we don't perform infinite recursion.
i suspect for .asdict() this may not be needed, and a ‘memo’ argument that's passed down would work, similar to how copy.copy() works?
The same problem also exists for __eq__. However dicts with self references are also not able to compare and fail with RecursionError (on Python 3.10.11)
>>> a = dict()
>>> a[1] = a
>>> b = dict()
>>> b[1] = b
>>> a == b
RecursionError: maximum recursion depth exceeded in comparison
It would be really nice to have a recursion detection for all these recursive methods :-)