dataclasses-json
dataclasses-json copied to clipboard
[BUG] Nested tuple with union of tuple does not deserialize correctly
Description
With this code:
from dataclasses import dataclass
from dataclasses_json import Undefined, dataclass_json
@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
cols: tuple[(tuple[str, str] | str), ...] = ()
print(TestInput1.from_json('{"cols": [["a", "b"], "c"]}'))
@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput2:
cols: tuple[tuple[str, str], ...] = ()
print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))
The output should contain only tuples. However, with the tuple[(tuple[str, str] | str), ...]
type, the inner tuple is actually converted as a list. Output:
TestInput1(cols=(['a', 'b'], 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))
Describe the results you expected
TestInput1(cols=(('a', 'b'), 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))
Python version you are using
Python 3.11.5
Environment description
dataclasses-json==0.6.2
Will take a look soon, sorry bit slow on our side here. Tough times.
No worries, thank you for your work.
Apparently this is how python converts a compatible collection into tuples (3.11):
> tuple(["a", "b"])
('a', 'b')
> tuple([["a", "b"], "b"])
(['a', 'b'], 'b')
So the library in general will get into trouble with nested generics like this one. It can technically be fixed by traversing the type args, but I fear things might go wrong there. Currently we do this:
try:
materialize_type = _get_type_cons(type_)
except (TypeError, AttributeError):
pass
res = materialize_type(xs)
where type_
is the field type hint. Fixing this issue would require some effort and I idk about v0 if that's worth it. Fun fact:
@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
cols: tuple[str, ...] = ()
gives TestInput1(cols=(['a', 'b'], 'c'))
as well which seems like a blown up coercion :D
le me know your thoughts @Edgeworth
It looks to me like the library currently works for nested tuples, see the OP
print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))
gives TestInput2(cols=(('a', 'b'), ('c', 'd')))
, but the union case gives it trouble. Not sure if that is related to the tuple([["a", "b"], "b"])
behaviour (just dunno).
For my personal use case, I worked around this. But I think it's reasonable to expect cols: tuple[(tuple[str, str] | str), ...]
to deserialize into something that matches the type hint (and works with hashable etc dataclasses)
TestInput1(cols=(['a', 'b'], 'c'))
does seem very weird also lmao
w.r.t. whether to implement this now or in the new version, personally I don't mind. Since I have a workaround, I think it's reasonable to leave it unless there are other people who need this to work.
WDYT?
It is because DCJ relies on outer type only. So when you have coherent types inside (only tuples), it will work fine. But when there is a variation - stuff will look ugly. I'll think if there is an easy way to fix this w/o blowing the rest of the lib
This works
@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput3:
cols: tuple[(tuple[str, str] | str), ...] = ()
_: InitVar[bool] = False
def __post_init__(self, _: bool = False):
if not _:
cols = tuple(tuple(x) if not isinstance(x, str) else x for x in self.cols)
config = dict(cols=cols, _=True)
self.__init__(**config)
print(TestInput3.from_json('{"cols": [["a", "b"], "c"]}'))