dataclasses-json icon indicating copy to clipboard operation
dataclasses-json copied to clipboard

[BUG] Nested tuple with union of tuple does not deserialize correctly

Open Edgeworth opened this issue 1 year ago • 7 comments

Description

With this code:

from dataclasses import dataclass

from dataclasses_json import Undefined, dataclass_json


@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
    cols: tuple[(tuple[str, str] | str), ...] = ()


print(TestInput1.from_json('{"cols": [["a", "b"], "c"]}'))


@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput2:
    cols: tuple[tuple[str, str], ...] = ()


print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))

The output should contain only tuples. However, with the tuple[(tuple[str, str] | str), ...] type, the inner tuple is actually converted as a list. Output:

TestInput1(cols=(['a', 'b'], 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))

Describe the results you expected

TestInput1(cols=(('a', 'b'), 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))

Python version you are using

Python 3.11.5

Environment description

dataclasses-json==0.6.2

Edgeworth avatar Nov 19 '23 06:11 Edgeworth

Will take a look soon, sorry bit slow on our side here. Tough times.

george-zubrienko avatar Nov 27 '23 20:11 george-zubrienko

No worries, thank you for your work.

Edgeworth avatar Nov 28 '23 13:11 Edgeworth

Apparently this is how python converts a compatible collection into tuples (3.11):

> tuple(["a", "b"])
('a', 'b')
> tuple([["a", "b"], "b"])
(['a', 'b'], 'b')

So the library in general will get into trouble with nested generics like this one. It can technically be fixed by traversing the type args, but I fear things might go wrong there. Currently we do this:

        try:
            materialize_type = _get_type_cons(type_)
        except (TypeError, AttributeError):
            pass
        res = materialize_type(xs)

where type_ is the field type hint. Fixing this issue would require some effort and I idk about v0 if that's worth it. Fun fact:

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
    cols: tuple[str, ...] = ()

gives TestInput1(cols=(['a', 'b'], 'c')) as well which seems like a blown up coercion :D

george-zubrienko avatar Jan 26 '24 13:01 george-zubrienko

le me know your thoughts @Edgeworth

george-zubrienko avatar Jan 26 '24 13:01 george-zubrienko

It looks to me like the library currently works for nested tuples, see the OP

print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))

gives TestInput2(cols=(('a', 'b'), ('c', 'd'))), but the union case gives it trouble. Not sure if that is related to the tuple([["a", "b"], "b"]) behaviour (just dunno).

For my personal use case, I worked around this. But I think it's reasonable to expect cols: tuple[(tuple[str, str] | str), ...] to deserialize into something that matches the type hint (and works with hashable etc dataclasses)

TestInput1(cols=(['a', 'b'], 'c')) does seem very weird also lmao

w.r.t. whether to implement this now or in the new version, personally I don't mind. Since I have a workaround, I think it's reasonable to leave it unless there are other people who need this to work.

WDYT?

Edgeworth avatar Jan 26 '24 14:01 Edgeworth

It is because DCJ relies on outer type only. So when you have coherent types inside (only tuples), it will work fine. But when there is a variation - stuff will look ugly. I'll think if there is an easy way to fix this w/o blowing the rest of the lib

george-zubrienko avatar Jan 26 '24 14:01 george-zubrienko

This works

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput3:
    cols: tuple[(tuple[str, str] | str), ...] = ()
    _: InitVar[bool] = False

    def __post_init__(self, _: bool = False):
        if not _:
            cols = tuple(tuple(x) if not isinstance(x, str) else x for x in self.cols)
            config = dict(cols=cols, _=True)
            self.__init__(**config)



print(TestInput3.from_json('{"cols": [["a", "b"], "c"]}'))

jfloyd1697 avatar Mar 07 '24 20:03 jfloyd1697