dataclass-wizard
dataclass-wizard copied to clipboard
Cylic data dependencies cause infinite recursion
- Dataclass Wizard version: 0.22.1
- Python version: 3.10.5
- Operating System: Linux 5.18.7-arch1-1
Description
I am modeling some data that is self-referential, as in it may contain references to other object of its own type. In attempting to use dataclass-wizard to parse a dict of this data I received the following error
...
return typing._eval_type(base_type, base_globals, _TYPING_LOCALS)
File "/usr/lib/python3.10/typing.py", line 327, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
File "/usr/lib/python3.10/typing.py", line 693, in _evaluate
type_ = _type_check(
File "/usr/lib/python3.10/typing.py", line 164, in _type_check
arg = _type_convert(arg, module=module, allow_special_forms=allow_special_forms)
File "/usr/lib/python3.10/typing.py", line 141, in _type_convert
if isinstance(arg, str):
RecursionError: maximum recursion depth exceeded while calling a Python object
What I Did
Here is a simple script to reproduce the issue
from typing import Optional
from dataclasses import dataclass
from dataclass_wizard import asdict, fromdict
@dataclass
class A:
a: Optional['A'] = None
a = A(a=A(a=A(a=A())))
d = asdict(a)
print(d) # {'a': {'a': {'a': {'a': None}}}}
print(fromdict(A, d))
Any potential cyclic data dependency will trigger this issue, not just strictly self-referential ones.
from typing import Optional
from dataclasses import dataclass
from dataclass_wizard import asdict, fromdict
@dataclass
class A:
b: Optional['B'] = None
@dataclass
class B:
a: Optional['A'] = None
a = A(b=B(a=A(b=B(a=None))))
d = asdict(a)
print(d) # {'b': {'a': {'b': {'a': None}}}}
print(fromdict(A, d))
I did a bit of digging in the code before submitting the issue to get an understanding of why this was happening, and I'm not sure it's an easy fix :grimacing:
Not really expecting speedy resolution assuming you even want to support this kind of use case.
Hi @buckley-w-david, thanks for opening this issue. I had thought about it much earlier for a while, like really thought about it, and I'm hopeful because I think the fix should be really simple. I feel like I was overcomplicating it in my head, because I've always struggled with understanding recursion in any form.
Just a stop-gap solution for now, I think once I look into the performance milestones I created, that will involve dynamically generating the (de)serialization code like for from_dict so it might be a little simpler down the road when I get around to tackling that (knock on wood).
The simple workaround I had alluded to involves a few things:
- add a new Meta flag,
recursive_dataclassesor something similar. - when this flag is enabled, don't immediately generate and cache the load function for a (nested) dataclass, as this will cause recursion if it's self-referential, either at top-level somewhere down the road as mentioned.
- instead, we want to return a delayed load function (essentially a function to return a function), which will be generated on the first time it's called rather than immediately, and then preferably cached so that subsequent loads are faster.
Sorry if that all sounds confusing, but the good news is that I've effectively put together a hotfix for this in under half an hour (not completely there), so I can confirm that the recursion error is able to be resolved at least. I'll see if I can put together something soon that will hopefully shed a little more light on the direction I had in mind to solve the (tricky) self-referential problem.
A little bump - I am facing the exact same issue. Happy to test a fix if you have one. Thanks for an awesome library btw.
Also a quick question. A work around for me would be to add a custom from_dict for a particular field. I ve read the docs a bit, but am unable to understand how to do this.
@dataclass
class A:
b: Optional['B'] = None # Can I add a custom from_dict for this property?
@dataclass
class B:
a: Optional['A'] = None
I've checked in a WIP branch with the working changes. I was able to test with enabling a new meta field recursive_classes and confirmed that the use cases above now work as expected when de-serializing with fromdict. I also added new test cases, based on the above.
FWIW, I've also updated the first comment; probably a typo, but I think the expected dict object should be instead:
{'b': {'a': {'b': {'a': None}}}}
@trulite I'll have to dig a bit deeper to confirm, but my initial suspicion is that it won't be possible currently to set a per-field parser to use in from_dict. This is probably a good idea for a feature request though - not saying that it would be needed in this particular case, but I imagine there are other situations where it could be quite useful.
Thanks for now - I am leaving it as a dict and doing a post process to convert to the class I want
Is there any update?
It looks like a basic issue : recursion should stop when there a None or an empty list is encountered.
In the example below for dataclass A the problem can be circumvented by introducing a custom classmethod from_dict(). However it does not catch on for AA that contains A and you would be getting the notorious recursionError: maximum recursion depth exceeded in __instancecheck__ problem.
from dataclasses import dataclass, field
from typing import List, Type
from dataclass_wizard import JSONWizard
from dataclass_wizard.abstractions import W
from dataclass_wizard.type_def import JSONObject
@dataclass
class A(JSONWizard):
value: int
children: List["A"] = field(default_factory=list)
@classmethod
def from_dict(cls: Type[W], o: JSONObject) -> W:
value = o.pop("value")
children = [A.from_dict(oo) for oo in o.pop("children", [])]
return A(value=value, children=children)
@dataclass
class AA(JSONWizard):
a: A
nested_dict = {
"value": 1,
"children": [{"value": 2, "children": []}, {"value": 3, "children": []}],
}
a_obj = A.from_dict(nested_dict)
aa_obj = AA.from_dict({"a": nested_dict})
print(a_obj)
print(aa_obj)
It would be great to have a solution soon! Do you have any updates @rnag ?