dataclasses-json icon indicating copy to clipboard operation
dataclasses-json copied to clipboard

Ability to deserialize json with type properties as created by Jackson @JsonTypeName or @JsonSubType

Open WAManiatty opened this issue 3 years ago • 1 comments

The dataclasses-json tool is very impressive but currently I'm having difficulty adopting it for a real world use case at work. The json data I need to encode/decode (serialize/deserialize) are in json objects encoded with type properties using Jackson's @JsonTypeInfo and @JsonSubTypes with a collection of @JsonSubtypes.Type.

I've been trying to figure out how to do this in the current release (0.5.2 at the time of writing). My requirements are as follows:

  1. The classes I want to deserialize have members with an base class type (frequently abstract) which when instantiated can resolve to an instance of any of the available derived class types.
  2. The generated json has a field with a corresponding key and value that is inspected at runtime to determine which class was serialized and which class should be used for deserialization. It should be noted that the key is not consistent across base classes, and the value is a string uniquely indicating the subtype but is often not something suitable for a class name.

I am currently unaware of a robust and simple way to encode or decode json using baseclasses-json that meets the above requirements. I've considered several approaches:

  1. the Union field approach of #93 looked promising, but the key and value of the type information seems to be hardwired to use "__type" for the key and the value must be the name of the python class.
  2. The Jackson Style Subtype Parsing look and feel proposed in #84 would solve the problem, but the pr seems old and I'm not sure how the lead developer(s) feel about it.
  3. I considered trying to do it through overriding the base classes __new__ method, but I haven't found a clean way (diddling with the CatchAll field and monkey patching to add the extra fields in the derived class seems inelegant).

I was wondering about the following.

  1. Is there a smart way to do this with the existing code?
  2. If not, is there a preferred/advocated approach that would likely get approval from the @lidatong and any other lead developer(s) for this project?

If a change is needed to get this functionality, possible approaches include the following, but I would be open to other approaches:

  1. Extending the Union Field approach of #93 with an Optional[str] override for the key of what is currently the '__type' key and an Optional[Mapping[str, str]] user defined mapping of the value to the value to class (or class name).
  2. Reviving #84, to be honest I haven't looked that closely at the code (yet).

Thanks for your help.

EDITED: added some sample code and output to try to better illustrate the issue:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# vim: ts=4 sw=4 expandtab
"""Test case for dataclasses, want to see if generic derived classes can be deserialized"""

import abc
from dataclasses import dataclass
from typing import Optional, Union

from dataclasses_json import dataclass_json, LetterCase  # , DataClassJsonMixin


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class BaseClass(metaclass=abc.ABCMeta):
    """The base class"""
    type: str  # the @JsonSubType.type name


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class Derived1(BaseClass):
    """A derived class"""
    type: str = "derived1"
    data_field_a: Optional[bool] = None


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class Derived2(BaseClass):
    """The other derived class"""
    type: str = "derived2"
    data_field_b: Optional[str] = None


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class HowDoIExpressThis:
    """The other derived class"""
    derived1_or_derived2: Union[Derived1, Derived2]


how_do_i_express_this_schema = HowDoIExpressThis.schema()

expected_object = HowDoIExpressThis(Derived1("derived1", True))
print(f'expected_object={expected_object}')
actual_json = how_do_i_express_this_schema.dumps(expected_object)
# note the hard wired __type key and the Class Name as its value in the derived1OrDerived2 value
# prints actual_json={"derived1OrDerived2": {"type": "derived1", "dataFieldA": true, "__type": "Derived1"}}
print(f'actual_json={actual_json}')
actual_object = how_do_i_express_this_schema.loads(actual_json)
print(f'actual_object={actual_object}')
# the server should generate something like this, not the actual_json
# note the absence of __type field
desired_json = """{"derived1OrDerived2": {"type": "derived1", "DataFieldA": true}}"""
fails_to_parse = how_do_i_express_this_schema.loads(desired_json)
print(f'fails_to_parse={fails_to_parse}')

when I run it I see:

$ python ./apps/tutorials/dataclassesjson-issue252-example.py 
expected_object=HowDoIExpressThis(derived1_or_derived2=Derived1(type='derived1', data_field_a=True))
actual_json={"derived1OrDerived2": {"type": "derived1", "dataFieldA": true, "__type": "Derived1"}}
actual_object=HowDoIExpressThis(derived1_or_derived2=Derived1(type='derived1', data_field_a=True))
/Users/bill.maniatty/dev/worktreerepos/MAC-3827-Machinify-Scripting-Should-Use-TypeSafe-Json-API/venv/lib/python3.8/site-packages/dataclasses_json/mm.py:108: UserWarning: The type "dict" (value: "{'type': 'derived1', 'DataFieldA': True}") is not in the list of possible types of typing.Union (dataclass: HowDoIExpressThis, field: derived1_or_derived2). Value cannot be deserialized properly.
  warnings.warn(
fails_to_parse=HowDoIExpressThis(derived1_or_derived2={'type': 'derived1', 'DataFieldA': True})

WAManiatty avatar Oct 10 '20 08:10 WAManiatty