attrs icon indicating copy to clipboard operation
attrs copied to clipboard

Missing recursion detection in attr.asdict

Open wimglenn opened this issue 3 years ago • 8 comments

Suppose we use attrs instances to represent graphs:

>>> @attr.s
... class Node:
...     val = attr.ib()
...     linked_node = attr.ib(default=None)
... 
>>> n = Node(1)
>>> attr.asdict(n)
{'val': 1, 'linked_node': None}
>>> n.linked_node = n
>>> n
Node(val=1, linked_node=...)

Although the repr handles it, attr.asdict does not detect and resolve cycles - should it?

>>> d = attr.asdict(n)
...
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
...
.venv/lib/python3.9/site-packages/attr/_funcs.py in asdict(inst, recurse, filter, dict_factory, retain_collection_types, value_serializer)
     60         if recurse is True:
     61             if has(v.__class__):
---> 62                 rv[a.name] = asdict(
     63                     v,
     64                     True,

RecursionError: maximum recursion depth exceeded while calling a Python object

Maybe this the intended behavior (if so, please close this issue), I'm not sure. But I half-expected a self-referencing dict to be returned:

>>> d
{'val': 1, 'linked_node': {...}}

wimglenn avatar Dec 22 '20 09:12 wimglenn

This wasn't clear the first time I looked at your example, but it is indeed infinitely recursive, and as you point out there would need to be some cycle detection here. You are re-using the same instance for linked_node, if you made a new Node(1) it would not occur, or used attr.evolve to copy it. All of which I am sure you realize, just pointing it out for others that come across this issue.

from __future__ import annotations
from typing import Optional

import attr


@attr.s(auto_attribs=True)
class Node:
    val: float
    linked: Optional[Node] = None


n = Node(1)
print(attr.asdict(n))
n.linked = Node(1)
print(attr.asdict(n))

Output:

{'val': 1, 'linked': None}
{'val': 1, 'linked': {'val': 1, 'linked': None}}

dam5h avatar Jul 14 '21 15:07 dam5h

tbh i don't think this is worth ‘fixing’. in the case of repr(), mostly useful for developing/debugging, not crashing is useful behaviour, and consistent with how repr() behaves for built-ins like dictionaries:

>>> d = {"a": 123}
>>> d["a"] = d
>>> d
{'a': {...}}

however, throw some json serialisation into the mix and it will 💥:

>>> import json
>>> json.dumps(d)
Traceback (most recent call last):
  Input In [14] in <cell line: 1>
    json.dumps(d)
  File /usr/lib/python3.10/json/__init__.py:231 in dumps
    return _default_encoder.encode(obj)
  File /usr/lib/python3.10/json/encoder.py:199 in encode
    chunks = self.iterencode(o, _one_shot=True)
  File /usr/lib/python3.10/json/encoder.py:257 in iterencode
    return _iterencode(o, 0)
ValueError: Circular reference detected

it raises ValueError (b/c json does its own detection), while attr.asdict() raises RecursionError, but that's an implementation detail.

the key difference here is that, unlike repr(), there is no single obvious solution here, and hence raising is the right way to go.

tempted to ‘wontfix’. thoughts?

wbolster avatar Mar 21 '22 15:03 wbolster

I don't think json encoding is a fair analogy, because it is not possible to represent circular references in a json serialization. It is possible in a Python dict, so why shouldn't attr.asdict just go ahead and do that?

If raising an exception is the right way to go, it should still be a controlled failure mode rather than consuming resources and eventually blowing the call stack out with recursion error - this way, there's the possibility to give some context about where the circular reference is coming from in the error message.

So, I think whether attrs returns a dict here or raises an exception, the cycle detection would still be beneficial either way.

wimglenn avatar Mar 22 '22 02:03 wimglenn

fair enough, producing a recursive dict would be a reasonable behaviour indeed. but would this always be desired?

(though in practice dict conversion is often used as a quick way to serialise which may fail anyway)

@Tinche what does cattrs do here? (and why?)

wbolster avatar Mar 22 '22 07:03 wbolster

I don't think we do anything special apart from raising a RecursionError. Serializing recursive data is tricky and requires a completely different schema approach (you need to switch to some sort of pointer abstraction) than what's normally expected in JSON/bson etc. And this behavior would have to be opt-in so 99% of other, more usual cases don't take a speed hit.

I guess the question is how much more complex and slower would asdict have to become to support this?

Tinche avatar Mar 22 '22 12:03 Tinche

I don’t think it is that tricky in practice, you just keep track of visited ids as you go. The repr is already doing this, as mentioned, so take a look in there for an example.

If there’s interest, a PR can be prepared for profiling

wimglenn avatar Mar 22 '22 15:03 wimglenn

sure, a pr for testing may help.

the current repr() implementation uses threading.local(), with a .already_repring that holds a set of currently ‘repr'ed’ id(...) values, because:

# Thread-local global to track attrs instances which are already being repr'd.
# This is needed because there is no other (thread-safe) way to pass info
# about the instances that are already being repr'd through the call stack
# in order to ensure we don't perform infinite recursion.

i suspect for .asdict() this may not be needed, and a ‘memo’ argument that's passed down would work, similar to how copy.copy() works?

wbolster avatar Apr 07 '22 19:04 wbolster

The same problem also exists for __eq__. However dicts with self references are also not able to compare and fail with RecursionError (on Python 3.10.11)

>>> a = dict()
>>> a[1] = a

>>> b = dict()
>>> b[1] = b
>>> a == b
RecursionError: maximum recursion depth exceeded in comparison

It would be really nice to have a recursion detection for all these recursive methods :-)

Darkdragon84 avatar Jun 01 '23 15:06 Darkdragon84