marshmallow_dataclass
marshmallow_dataclass copied to clipboard
When type hint is not recognized (e.g. `pd.Timestamp`), we recursively try to convert into a dataclass which mutates the constructor throughout the codebase
Reproducible error:
import pandas as pd
import dataclasses
import marshmallow_dataclass
@dataclasses.dataclass
class Something:
date_i_care_about: pd.Timestamp
print(f"here's a timestamp {pd.Timestamp('2017-01-01')}")
# this mutates the constructor for pd.Timestamp by recursively trying to convert to dataclass
marshmallow_dataclass.class_schema(Something)
print(f"here's a timestamp {pd.Timestamp('2017-01-01')}")
versions: pandas 0.23.4 marshmallow 2.x marshmallow_dataclass: 0.6.6
@lovasoa what is the intution behind trying to convert clazz
to dataclass here after we fail to get the fields?
https://github.com/lovasoa/marshmallow_dataclass/blob/master/marshmallow_dataclass/init.py#L281
I feel that an error (or some kind of strict
mode) would be more helpful
marshmallow_dataclass allows you to have simple classes, like :
class A:
a: int
class B:
b: A
marshmallow_dataclass.class_schema(B)
In order to create a schema for B, it also has to create a schema for A. And in order to apply dataclasses.fields
to A, it has to make it a dataclass: https://github.com/lovasoa/marshmallow_dataclass/blob/master/marshmallow_dataclass/init.py#L281
I don't think it's possible to fix this issue without breaking backward compatibility... What behavior would you expect for your example above ?
If you want a custom type in your class, the best way is to use a NewType declaration
Ok I see your point. In my (and my colleagues) use case, we would already have made both A
and B
a dataclass
(because we want all the benefits of dataclasses in our application code), and then when it comes to serializing an instance of B, we'd generate a marshmallow schema by doing:
marshmallow_dataclass.class_schema(B).dump(...)
However I guess you've primarily designed for the case where folks are using the marshmallow_dataclass.dataclass
decorator to do both at once....
What if we modified the logic for checking whether B
should be converted into dataclass to check if it has already has an __init__
defined. I can't think of many cases where someone has a class with a constructor and wants to make it a dataclass
I think that would be acceptable...
In order to avoid breaking compatibility, I am simply adding a new warning that will be displayed before marshmallow_dataclass tries to alter a class :
UserWarning: marshmallow_dataclass was called on the class Path, which is not a dataclass. It is going to try and convert the class into a dataclass, which may have undesirable side effects. To avoid this message, make sure all your classes and all the classes of their fields are either explicitly supported by marshmallow_datcalass, or are already dataclasses. For more information, see https://github.com/lovasoa/marshmallow_dataclass/issues/51
To anyone who would end up here after seeing this warning: to fix this , you should make sure that all the fields of all your classes are either simple base types, or dataclasses themselves.
For instance, the following is good :
import marshmallow_dataclass
import dataclasses
@dataclasses.dataclass
class A:
a: int
class B:
b: A
marshmallow_dataclass.class_schema(B)
and the following is bad :
import marshmallow_dataclass
class A:
a: int
class B:
b: A
marshmallow_dataclass.class_schema(B)
and this is the worse, because marshmallow_dataclass will change the Path class, and break it globally in your whole application :
from pathlib import Path
class B:
b: Path
marshmallow_dataclass.class_schema(B)
Hi all, I think this is quite a serious side effect, for example, it breaks JSON encoding of classes that are handled by a custom JSON encoder, because they are now considered to be dataclasses so they are converted natively.
Can you add an even bigger warning? I didn't actually notice it because it's printed with other messages at flask startup. Also, the warning could contain a suggestion to fix it by adding a marshmallow_field
to the field metadata, because that's ultimately the only way to support fields with custom non-dataclass classes, right?
@prihoda : Could you make a PR with the changes you want to see on the message ? We'll discuss the changes there.
I got burned by this one trying to use pathlib.Path
as a type. Seeing as pathlib
is seeing increased usage is there any way to have this type treated correctly? As a workaround I can make a base class like this:
class _HasPaths:
"""Workaround to ensure we get ``pathlib.Path`` types where needed."""
def __post_init__(self) -> None:
for fld in fields(self):
if fld.type == Path:
attr = getattr(self, fld.name)
setattr(self, fld.name, Path(attr))
and have any dataclass that uses Path
as a type inherit from it but this is less than ideal when pathlib.Path
should probably be a type that's handled by default.
The warnings introduced in #130 were helpful in figuring this out but for several hours I was confounded by a bunch of pytest INTERNALERROR
messages that were pretty impenetrable.
I think this is a request for marshmallow itself.
I think this auto-dataclassification is too much magic, and likely to create hard-to-suss bugs.
Note that the dataclass
decorator does not construct and return a new class, rather, it munges the decorated class in-place. So current behavior results in mangling other peoples code in a hidden, fairly magic, manner.
Here's a possible fix.
====
I think we can handle "simple annotated classes"(*) as in the above example without converting them to dataclasses.
For the sake of concreteness, let's say "simple annotated classes" are defined as objects, cls
, for which:
-
isinstance(cls, type)
-
cls.__init__
isobject.__init__
-
cls.__new__
isobject.__new__
-
typing.get_type_hints(cls)
returns non-empty
(This definition could perhaps be tweaked/widened a bit.)
Our class_schema
currently uses the fact that its argument is a dataclass for the following purposes.
-
We currently use
dataclasses.fields
to get at:a. field names — for "simple classes" we can get these from
typing.get_type_hints
instead b. field default values — for "simple classes", the value of the class variable (if any) will do c. field metadata — we can safely assume there is no metadata for simple classes -
We use the dataclasses constructor to construct the return value in the
load
method of our generated schemas. Instead, for "simple classes", we can construct them manually: create an instance by calling their__new__
, then manually initialize its attributes.
So the proposed fix is to change class_schema
so that:
- If the argument is a dataclass, do like we do now.
- Otherwise, if the argument is a "simple annotated class", construct a schema using the alternate methods described above (notably, without mangling the argument into a dataclass).
- Otherwise, raise a TypeError("do not know how to construct schema for X")
- Otherwise, raise a TypeError("do not know how to construct schema for X")
I agree that the library should not be optimistically trying to create dataclasses out of types it does not understand. I didn't even realise the library could do that until today. Raise an error and let the user wrap with @dataclass
themselves - it's so trivial.
Today I lost several hours to Path
getting clobbered and not getting the warning on the console because of reasons.
I couldn't even debug it properly because the debugger was crashing trying to use the clobbered type to show values.
Maybe to keep backwards compatibility there could be some kind of strict mode flag / settable module option?