dataclasses-json
dataclasses-json copied to clipboard
Why is a class decorator or inheritance approach required?
Hello, first things first: thanks for this project! I had a similar idea many years ago, but without type annotations it failed miserably.
I have a question: why are we absolutely required to change the class that we want to map to/from json by using a decorator or inheritance?
Counter-example: in Java, when using things like jackson ( https://github.com/FasterXML/jackson - example usage https://www.stubbornjava.com/posts/practical-jackson-objectmapper-configuration ) we can define a mapper and use it to convert plain Java objects to JSON and vice-versa. We don't need to do anything with our Java object unless we need to do something special (and, even there, Jackson and similar tools usually allow to configure a mapper, even though it's a bit harder than using an annotation).
This makes it easy to work with third-party libraries where you don't control the class you're returned, and/or you don't need to change your dataclasses all around the code. Also, if you need different serialization options in various parts of your library, you can just define multiple mappers and call it a day (imagine I need to serialize my objects to json, xml, and protobuf, I'd need to decorate my dataclass three times instead of defining three separate mappers).
I took a peek at the code and I couldn't answer; do you think there's something really binding for the current implementation approach?
Totally agree. I don't want my dataclasses to have to_json()
or from_dict
or whatever methods. I just want a library that can do this:
>>> person = Person(name='Hans', age=26)
>>> library.to_dict(person)
{'name': 'Hans', 'age': 26}
What reason is there to make the interface any more complicated than that? If I have 15 dataclasses in my project, why do I have to decorate all 15 of them with @dataclass_json
? That's just inconvenient.
Hi @alanfranz I'm sorry for not responding. The answer is you can achieve what you want with the library as is and plain Python:
DataClassJsonMixin.to_json(MyClass, my_class_instance)
DataClassJsonMixin.from_json(MyClass, my_class_instance)
You can alias those methods in your own code as in:
to_json = DataClassJsonMixin.to_json
if the above is too verbose
@lidatong uhmm, I don't see how that can work. Maybe in Python2? There's no "unbound method" concept in Python3, and it wouldn't work anyway (where is a plain dataclass to find the to_dict() method which is required in to_json()?) See the following example, tested with Python 3.9.2.
from dataclasses import dataclass
from dataclasses_json import DataClassJsonMixin
@dataclass
class ConfiguredSimpleExample:
int_field: int
a = ConfiguredSimpleExample(10)
print(a)
#1st doesn't work
DataClassJsonMixin.to_json(ConfiguredSimpleExample, a)
#2nd doesn't work
#DataClassJsonMixin.to_json(a)
Error if 1st line uncommented:
$ python poc.py
ConfiguredSimpleExample(int_field=10)
Traceback (most recent call last):
File "/private/tmp/asd/poc.py", line 12, in <module>
DataClassJsonMixin.to_json(ConfiguredSimpleExample, a)
TypeError: to_json() takes 1 positional argument but 2 were given
Error if 2nd line uncommented:
$ python poc.py
ConfiguredSimpleExample(int_field=10)
Traceback (most recent call last):
File "/private/tmp/asd/poc.py", line 14, in <module>
DataClassJsonMixin.to_json(a)
File "/private/tmp/asd/lib/python3.9/site-packages/dataclasses_json/api.py", line 50, in to_json
return json.dumps(self.to_dict(encode_json=False),
AttributeError: 'ConfiguredSimpleExample' object has no attribute 'to_dict'
I apologize if I misunderstood anything.
@lidatong I suppose you might have wanted to say something like this:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass
class ConfiguredSimpleExample:
int_field: int
a = ConfiguredSimpleExample(10)
print(a)
print(dataclass_json(a.__class__).to_json(a))
Which seems to work. But it relies on an implementation detail - it seems that there's no postprocessing of the decorated class in this example. But the dataclass_json decorator DOES perform a postprocessing in certain cases (see https://github.com/lidatong/dataclasses-json/blob/3dc59e01ccdfec619ee4e4c3502b9759b67c3fa8/dataclasses_json/api.py#L138 ) so I don't think the approach is reliable.
I think the serialization logic could be extracted, then called from the mixin or decorator. If you think it can be useful and merged, I'm willing to write a PR.
@alanfranz sorry my mistake, when I responded I didn't have my laptop with me and was going off memory
I meant to suggest your 2nd example DataClassJsonMixin.to_json(a)
but it leads to the error you ran into AttributeError: 'ConfiguredSimpleExample' object has no attribute 'to_dict'
I forgot internally to_json
delegates to self.to_dict(...)
. I can fix this by just replacing that line with DataclassJsonMixin.to_dict(self)
I'm ok with making that change so you can do DataClassJsonMixin.to_json(...)
on any object without having to modify the class itself. I will just need to test it doesn't break anything (don't expect it to)
There's no "unbound method" concept in Python3, and it wouldn't work anyway (where is a plain dataclass to find the to_dict() method which is required in to_json()?)
I'm not sure what you mean by "unbound method"... but a method is just a function that is implicitly passed the instance self
. It can be called on any object like a regular function, by explicitly passing in the instance for the self
parameter
class SomeMixin:
def do_something(self):
print("doing something")
class A:
pass
a = A()
SomeMixin.do_something(a)
Please don't go down the DataClassJsonMixin.to_json(some_obj)
route. Any person who reads that code would assume that calling a DataClassJsonMixin
method on an object that is not a DataClassJsonMixin
instance is a bug, or in the best case, something that only happens to work by sheer coincidence. As you've already discovered, such a call is likely to fail because it internally calls another method of the same class. DataClassJsonMixin.to_json(some_obj)
is really really bad code.
If you want to support this feature, dataclass_json.to_json(some_obj)
is the proper way to implement it.
@lidatong now I feel old :-) an unbound method is a Python2 thing; when a function is passed in a class body, it becomes an unbound method, which means that it's not "just a function" anymore;
Class.unbound_method(random_object)
Would raise an UnboundMethodException or something like that. You could, BTW, pass an instance of the correct class in order to call such method - it was quite unuseful, btw.
When called on an instance, it would become a "bound method", which would mean that the instance is passed as the first argument. It works the same in Python3, but the concept of "unbound method" was removed, so the function can be called on the class as if it were static.
By the way, for both: I think the approach is jackson-like, we need to separate the mapper from the object hierarchy. My idea would be something like that:
class JsonMapper:
def __init__(self, config):
pass
def to_json()
...
def to_dict()
...
def from_dict()
...
def from_json():
...
So anyone is free to configure one, two, or thirty mapper as they see fit. Of course there would be a defaultMapper instance which would to default things. The current mixins and functions could stay the same, but would configure/call a mapper instead of operating directly.
@lidatong let me know what you think of this approach.
I agree the logic that converts instances to dicts/json should be separated from the class hierarchy. I don't really care how exactly that's implemented (I'll be happy as long as the amount of boilerplate code doesn't grow linearly with the number of dataclasses in my code base), but I figure I might as well share my thoughts regarding the design.
IMO, a Mapper
class is overkill. Converting instances to JSON is a straightforward process with 2 steps:
- Convert the instance attributes to json-serializable types (dicts, lists, etc) by calling
to_dict
- Call
json.dump
on that.
It's trivial and intuitive to customize this process by passing the relevant arguments into to_dict
or json.dump
. Anyone can do that without having to be told how. A Mapper
class on the other hand is not nearly as intuitive - this is something you have to read the documentation for to figure out what it does and how it's used. It makes things needlessly complicated, as far as I'm concerned. A dataclasses_json.to_dict
and a dataclasses_json.from_dict
method is all we need.
pardon me @Aran-Fey but I disagree.
There usually is some configuration regarding serialization (I'm not sure about this specific library, I'm taking a general POV), which can differ from a Python dictionary (e.g. do we serialize None as null or we just don't serialize the key? Do we write integers as numbers or strings? Do we map names somehow - e.g. camelCase vs under_score) currently, such configuration is stored (as far as I can understand) in the decorated class and/or instance. As soon as we drop that link, we need a place to store such configuration.
Passing a configuration dictionary every time to free functions is tedious and error-prone, isn't it? What problem do you see with Mapper? Take a look at Jackson's own mapper - https://fasterxml.github.io/jackson-databind/javadoc/2.7/com/fasterxml/jackson/databind/ObjectMapper.html - it's got exactly zero required configuration. If want and need, you can configure it differently.
Mapper().to_json(....)
would be so difficult? Of course there can be a global, default Mapper instance and a default static method to_dict/to_json, maybe with configuration overrides. But I think it's quite unlikely that you'd have to change config options at each call; you'll probably be configuring one or two mappers and use them consistenly throughout your app.
I don't think comparing Java libs with Python libs is a way to get stuff done. In Scala case classes are ser-desered with a single implicit import. Doesn't mean we should implement Scala type system. Python stores all object metadata in dict, thus technically it is possible (and easy) to do a generic to_json method. 'from_json' could be implemented with generics, but there can be issues getting fields mapped. Main benefit of the current approach is safety and reliability across python versions. Question is, are you willing to trade this for bit cleaner looking class definitions?