dataclasses-json icon indicating copy to clipboard operation
dataclasses-json copied to clipboard

Why is a class decorator or inheritance approach required?

Open alanfranz opened this issue 3 years ago • 10 comments

Hello, first things first: thanks for this project! I had a similar idea many years ago, but without type annotations it failed miserably.

I have a question: why are we absolutely required to change the class that we want to map to/from json by using a decorator or inheritance?

Counter-example: in Java, when using things like jackson ( https://github.com/FasterXML/jackson - example usage https://www.stubbornjava.com/posts/practical-jackson-objectmapper-configuration ) we can define a mapper and use it to convert plain Java objects to JSON and vice-versa. We don't need to do anything with our Java object unless we need to do something special (and, even there, Jackson and similar tools usually allow to configure a mapper, even though it's a bit harder than using an annotation).

This makes it easy to work with third-party libraries where you don't control the class you're returned, and/or you don't need to change your dataclasses all around the code. Also, if you need different serialization options in various parts of your library, you can just define multiple mappers and call it a day (imagine I need to serialize my objects to json, xml, and protobuf, I'd need to decorate my dataclass three times instead of defining three separate mappers).

I took a peek at the code and I couldn't answer; do you think there's something really binding for the current implementation approach?

alanfranz avatar Aug 27 '20 07:08 alanfranz

Totally agree. I don't want my dataclasses to have to_json() or from_dict or whatever methods. I just want a library that can do this:

>>> person = Person(name='Hans', age=26)
>>> library.to_dict(person)
{'name': 'Hans', 'age': 26}

What reason is there to make the interface any more complicated than that? If I have 15 dataclasses in my project, why do I have to decorate all 15 of them with @dataclass_json? That's just inconvenient.

Aran-Fey avatar Apr 06 '21 09:04 Aran-Fey

Hi @alanfranz I'm sorry for not responding. The answer is you can achieve what you want with the library as is and plain Python:

DataClassJsonMixin.to_json(MyClass, my_class_instance)

DataClassJsonMixin.from_json(MyClass, my_class_instance)

You can alias those methods in your own code as in:

to_json = DataClassJsonMixin.to_json

if the above is too verbose

lidatong avatar Apr 06 '21 23:04 lidatong

@lidatong uhmm, I don't see how that can work. Maybe in Python2? There's no "unbound method" concept in Python3, and it wouldn't work anyway (where is a plain dataclass to find the to_dict() method which is required in to_json()?) See the following example, tested with Python 3.9.2.

from dataclasses import dataclass
from dataclasses_json import DataClassJsonMixin

@dataclass
class ConfiguredSimpleExample:
    int_field: int


a = ConfiguredSimpleExample(10)
print(a)
#1st doesn't work
DataClassJsonMixin.to_json(ConfiguredSimpleExample, a)
#2nd doesn't work
#DataClassJsonMixin.to_json(a)

Error if 1st line uncommented:

$ python poc.py
ConfiguredSimpleExample(int_field=10)
Traceback (most recent call last):
  File "/private/tmp/asd/poc.py", line 12, in <module>
    DataClassJsonMixin.to_json(ConfiguredSimpleExample, a)
TypeError: to_json() takes 1 positional argument but 2 were given

Error if 2nd line uncommented:

$ python poc.py
ConfiguredSimpleExample(int_field=10)
Traceback (most recent call last):
  File "/private/tmp/asd/poc.py", line 14, in <module>
    DataClassJsonMixin.to_json(a)
  File "/private/tmp/asd/lib/python3.9/site-packages/dataclasses_json/api.py", line 50, in to_json
    return json.dumps(self.to_dict(encode_json=False),
AttributeError: 'ConfiguredSimpleExample' object has no attribute 'to_dict'

I apologize if I misunderstood anything.

alanfranz avatar Apr 07 '21 10:04 alanfranz

@lidatong I suppose you might have wanted to say something like this:

from dataclasses import dataclass
from dataclasses_json import dataclass_json

@dataclass
class ConfiguredSimpleExample:
    int_field: int


a = ConfiguredSimpleExample(10)
print(a)
print(dataclass_json(a.__class__).to_json(a))

Which seems to work. But it relies on an implementation detail - it seems that there's no postprocessing of the decorated class in this example. But the dataclass_json decorator DOES perform a postprocessing in certain cases (see https://github.com/lidatong/dataclasses-json/blob/3dc59e01ccdfec619ee4e4c3502b9759b67c3fa8/dataclasses_json/api.py#L138 ) so I don't think the approach is reliable.

I think the serialization logic could be extracted, then called from the mixin or decorator. If you think it can be useful and merged, I'm willing to write a PR.

alanfranz avatar Apr 07 '21 10:04 alanfranz

@alanfranz sorry my mistake, when I responded I didn't have my laptop with me and was going off memory

I meant to suggest your 2nd example DataClassJsonMixin.to_json(a) but it leads to the error you ran into AttributeError: 'ConfiguredSimpleExample' object has no attribute 'to_dict'

I forgot internally to_json delegates to self.to_dict(...). I can fix this by just replacing that line with DataclassJsonMixin.to_dict(self)

I'm ok with making that change so you can do DataClassJsonMixin.to_json(...) on any object without having to modify the class itself. I will just need to test it doesn't break anything (don't expect it to)

There's no "unbound method" concept in Python3, and it wouldn't work anyway (where is a plain dataclass to find the to_dict() method which is required in to_json()?)

I'm not sure what you mean by "unbound method"... but a method is just a function that is implicitly passed the instance self. It can be called on any object like a regular function, by explicitly passing in the instance for the self parameter

class SomeMixin:
    def do_something(self):
        print("doing something")


class A:
    pass


a = A()
SomeMixin.do_something(a)

lidatong avatar Apr 07 '21 17:04 lidatong

Please don't go down the DataClassJsonMixin.to_json(some_obj) route. Any person who reads that code would assume that calling a DataClassJsonMixin method on an object that is not a DataClassJsonMixin instance is a bug, or in the best case, something that only happens to work by sheer coincidence. As you've already discovered, such a call is likely to fail because it internally calls another method of the same class. DataClassJsonMixin.to_json(some_obj) is really really bad code.

If you want to support this feature, dataclass_json.to_json(some_obj) is the proper way to implement it.

Aran-Fey avatar Apr 08 '21 18:04 Aran-Fey

@lidatong now I feel old :-) an unbound method is a Python2 thing; when a function is passed in a class body, it becomes an unbound method, which means that it's not "just a function" anymore;

Class.unbound_method(random_object)

Would raise an UnboundMethodException or something like that. You could, BTW, pass an instance of the correct class in order to call such method - it was quite unuseful, btw.

When called on an instance, it would become a "bound method", which would mean that the instance is passed as the first argument. It works the same in Python3, but the concept of "unbound method" was removed, so the function can be called on the class as if it were static.

By the way, for both: I think the approach is jackson-like, we need to separate the mapper from the object hierarchy. My idea would be something like that:

class JsonMapper:
     def __init__(self, config):
           pass

     def to_json()
          ...

     def to_dict()
          ...

     def from_dict()
          ...

    def from_json():
         ...

So anyone is free to configure one, two, or thirty mapper as they see fit. Of course there would be a defaultMapper instance which would to default things. The current mixins and functions could stay the same, but would configure/call a mapper instead of operating directly.

@lidatong let me know what you think of this approach.

alanfranz avatar Apr 08 '21 21:04 alanfranz

I agree the logic that converts instances to dicts/json should be separated from the class hierarchy. I don't really care how exactly that's implemented (I'll be happy as long as the amount of boilerplate code doesn't grow linearly with the number of dataclasses in my code base), but I figure I might as well share my thoughts regarding the design.

IMO, a Mapper class is overkill. Converting instances to JSON is a straightforward process with 2 steps:

  1. Convert the instance attributes to json-serializable types (dicts, lists, etc) by calling to_dict
  2. Call json.dump on that.

It's trivial and intuitive to customize this process by passing the relevant arguments into to_dict or json.dump. Anyone can do that without having to be told how. A Mapper class on the other hand is not nearly as intuitive - this is something you have to read the documentation for to figure out what it does and how it's used. It makes things needlessly complicated, as far as I'm concerned. A dataclasses_json.to_dict and a dataclasses_json.from_dict method is all we need.

Aran-Fey avatar Apr 12 '21 21:04 Aran-Fey

pardon me @Aran-Fey but I disagree.

There usually is some configuration regarding serialization (I'm not sure about this specific library, I'm taking a general POV), which can differ from a Python dictionary (e.g. do we serialize None as null or we just don't serialize the key? Do we write integers as numbers or strings? Do we map names somehow - e.g. camelCase vs under_score) currently, such configuration is stored (as far as I can understand) in the decorated class and/or instance. As soon as we drop that link, we need a place to store such configuration.

Passing a configuration dictionary every time to free functions is tedious and error-prone, isn't it? What problem do you see with Mapper? Take a look at Jackson's own mapper - https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/ObjectMapper.html - it's got exactly zero required configuration. If want and need, you can configure it differently.

Mapper().to_json(....)

would be so difficult? Of course there can be a global, default Mapper instance and a default static method to_dict/to_json, maybe with configuration overrides. But I think it's quite unlikely that you'd have to change config options at each call; you'll probably be configuring one or two mappers and use them consistenly throughout your app.

alanfranz avatar Apr 12 '21 21:04 alanfranz

I don't think comparing Java libs with Python libs is a way to get stuff done. In Scala case classes are ser-desered with a single implicit import. Doesn't mean we should implement Scala type system. Python stores all object metadata in dict, thus technically it is possible (and easy) to do a generic to_json method. 'from_json' could be implemented with generics, but there can be issues getting fields mapped. Main benefit of the current approach is safety and reliability across python versions. Question is, are you willing to trade this for bit cleaner looking class definitions?

george-zubrienko avatar Jul 02 '23 17:07 george-zubrienko