dataclass-wizard icon indicating copy to clipboard operation
dataclass-wizard copied to clipboard

Cylic data dependencies cause infinite recursion

Open buckley-w-david opened this issue 3 years ago • 7 comments

  • Dataclass Wizard version: 0.22.1
  • Python version: 3.10.5
  • Operating System: Linux 5.18.7-arch1-1

Description

I am modeling some data that is self-referential, as in it may contain references to other object of its own type. In attempting to use dataclass-wizard to parse a dict of this data I received the following error

...
    return typing._eval_type(base_type, base_globals, _TYPING_LOCALS)
  File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.10/typing.py", line 693, in _evaluate
    type_ = _type_check(
  File "/usr/lib/python3.10/typing.py", line 164, in _type_check
    arg = _type_convert(arg, module=module, allow_special_forms=allow_special_forms)
  File "/usr/lib/python3.10/typing.py", line 141, in _type_convert
    if isinstance(arg, str):
RecursionError: maximum recursion depth exceeded while calling a Python object

What I Did

Here is a simple script to reproduce the issue

from typing import Optional
from dataclasses import dataclass
from dataclass_wizard import asdict, fromdict

@dataclass
class A:
    a: Optional['A'] = None

a = A(a=A(a=A(a=A())))
d = asdict(a)
print(d) # {'a': {'a': {'a': {'a': None}}}}
print(fromdict(A, d))

Any potential cyclic data dependency will trigger this issue, not just strictly self-referential ones.

from typing import Optional
from dataclasses import dataclass
from dataclass_wizard import asdict, fromdict

@dataclass
class A:
    b: Optional['B'] = None

@dataclass
class B:
    a: Optional['A'] = None

a = A(b=B(a=A(b=B(a=None))))
d = asdict(a)
print(d)  # {'b': {'a': {'b': {'a': None}}}}
print(fromdict(A, d))

I did a bit of digging in the code before submitting the issue to get an understanding of why this was happening, and I'm not sure it's an easy fix :grimacing:

Not really expecting speedy resolution assuming you even want to support this kind of use case.

buckley-w-david avatar Jul 02 '22 06:07 buckley-w-david

Hi @buckley-w-david, thanks for opening this issue. I had thought about it much earlier for a while, like really thought about it, and I'm hopeful because I think the fix should be really simple. I feel like I was overcomplicating it in my head, because I've always struggled with understanding recursion in any form.

Just a stop-gap solution for now, I think once I look into the performance milestones I created, that will involve dynamically generating the (de)serialization code like for from_dict so it might be a little simpler down the road when I get around to tackling that (knock on wood).

The simple workaround I had alluded to involves a few things:

  • add a new Meta flag, recursive_dataclasses or something similar.
  • when this flag is enabled, don't immediately generate and cache the load function for a (nested) dataclass, as this will cause recursion if it's self-referential, either at top-level somewhere down the road as mentioned.
  • instead, we want to return a delayed load function (essentially a function to return a function), which will be generated on the first time it's called rather than immediately, and then preferably cached so that subsequent loads are faster.

Sorry if that all sounds confusing, but the good news is that I've effectively put together a hotfix for this in under half an hour (not completely there), so I can confirm that the recursion error is able to be resolved at least. I'll see if I can put together something soon that will hopefully shed a little more light on the direction I had in mind to solve the (tricky) self-referential problem.

rnag avatar Aug 06 '22 04:08 rnag

A little bump - I am facing the exact same issue. Happy to test a fix if you have one. Thanks for an awesome library btw.

trulite avatar Aug 07 '22 12:08 trulite

Also a quick question. A work around for me would be to add a custom from_dict for a particular field. I ve read the docs a bit, but am unable to understand how to do this.

@dataclass
class A:
    b: Optional['B'] = None # Can I add a custom from_dict for this property?

@dataclass
class B:
    a: Optional['A'] = None

trulite avatar Aug 07 '22 12:08 trulite

I've checked in a WIP branch with the working changes. I was able to test with enabling a new meta field recursive_classes and confirmed that the use cases above now work as expected when de-serializing with fromdict. I also added new test cases, based on the above.

FWIW, I've also updated the first comment; probably a typo, but I think the expected dict object should be instead:

{'b': {'a': {'b': {'a': None}}}}

rnag avatar Aug 15 '22 03:08 rnag

@trulite I'll have to dig a bit deeper to confirm, but my initial suspicion is that it won't be possible currently to set a per-field parser to use in from_dict. This is probably a good idea for a feature request though - not saying that it would be needed in this particular case, but I imagine there are other situations where it could be quite useful.

rnag avatar Aug 15 '22 03:08 rnag

Thanks for now - I am leaving it as a dict and doing a post process to convert to the class I want

trulite avatar Aug 21 '22 08:08 trulite

Is there any update?

It looks like a basic issue : recursion should stop when there a None or an empty list is encountered. In the example below for dataclass A the problem can be circumvented by introducing a custom classmethod from_dict(). However it does not catch on for AA that contains A and you would be getting the notorious recursionError: maximum recursion depth exceeded in __instancecheck__ problem.

from dataclasses import dataclass, field
from typing import List, Type

from dataclass_wizard import JSONWizard
from dataclass_wizard.abstractions import W
from dataclass_wizard.type_def import JSONObject


@dataclass
class A(JSONWizard):
    value: int
    children: List["A"] = field(default_factory=list)

    @classmethod
    def from_dict(cls: Type[W], o: JSONObject) -> W:
        value = o.pop("value")
        children = [A.from_dict(oo) for oo in o.pop("children", [])]
        return A(value=value, children=children)


@dataclass
class AA(JSONWizard):
    a: A


nested_dict = {
    "value": 1,
    "children": [{"value": 2, "children": []}, {"value": 3, "children": []}],
}

a_obj = A.from_dict(nested_dict)
aa_obj = AA.from_dict({"a": nested_dict})

print(a_obj)
print(aa_obj)

It would be great to have a solution soon! Do you have any updates @rnag ?

alexander-belikov avatar Dec 19 '23 11:12 alexander-belikov