marshmallow_dataclass icon indicating copy to clipboard operation
marshmallow_dataclass copied to clipboard

suggestion: functionality for *requiring* @marshmallow_dataclass instantiation passes schema

Open sinback opened this issue 4 years ago • 2 comments

I'm converting code from using traditional dataclasses to using marshmallow_dataclasses. My use case is to parse hierarchical config jsons into nested dataclasses, and do config validation at the same time.

In our unit tests, we use individual subconfigs. It's more natural to write these tests by creating the subconfig classes by hand than by loading up a string of a json dictionary. But you have to remember to use the class_schema to parse a string of jsons every time. I think it would be really useful to have a type of dataclass that ran the marshmallow validation not just on input strings but also in the __post_init__ method.

Here is an example of a fairly unpleasant workaround that achieves what I would really like to have:

from copy import copy

from dataclasses import field

from marshmallow import ValidationError
from marshmallow.validate import Range
from marshmallow_dataclass import dataclass


def test_bad_dict_load(klass):
    try:
        klass.Schema().load({'foo': 0})
    except ValidationError:
        return 'good!'
    else:
        return 'bad!'

def test_bad_explicit_creation(klass):
    try:
        klass(foo=0)
    except ValidationError:
        return 'good!'
    else:
        return 'bad!'


@dataclass
class Example:
    foo: int = field(metadata={'validate': Range(min=1)})


print(f'loading dict: {test_bad_dict_load(Example)}')
print(f'just making dataclass: {test_bad_explicit_creation(Example)}')


@dataclass
class ExampleHack:
    foo: int = field(metadata={'validate': Range(min=1)})

    # kinda gross suggestion for getting around recursion depth issues in __post_init__
    # hopefully no creators of ExampleHack other than ExampleHack.__post_init__ will make this
    # False ¯\_(ツ)_/¯ - a better solution could surely be implemented as well, right?
    new: bool = True

    def __post_init__(self):
        if self.new:
            schema = self.Schema()
            test = copy(self)
            test.new = False
            serialized = schema.dump(test)
            # ensure it's ok to create this instance of ExampleHack
            # (...hopefully we always expect dump-then-load to be idempotent)
            schema.load(serialized)
        else:
            pass

print(f'loading dict: {test_bad_dict_load(ExampleHack)}')
print(f'just making dataclass: {test_bad_explicit_creation(ExampleHack)}')

output:

loading dict: good!
just making dataclass: bad!
loading dict: good!
just making dataclass: good!

I think that this is likely to be a behavior that basically everyone always wants when using this library, but it would break some backwards compatibility (though imo by catching very likely bugs in others' code / unit tests! - so worth it!). If this is not something the maintainers like, we could implement behavior like this using a creation arg to marshmallow_dataclass.dataclass for example (something like @marshmallow_dataclass.dataclass(always_validate=True) eg.)

I have written a custom library similar to marshmallow_dataclass before (it was pretty hacky though), so I think I could probably achieve this in a PR if that would be welcome. If I'm remembering correctly, we also used a serialize-then-deserialize type check to get the job done. I don't think that's a bad way to do this, just that setting up the field new explicitly in ExampleHack is not really ok - my planned implementation would probably look like this, but obscure new from the user if possible.

sinback avatar Jul 20 '21 20:07 sinback

Could this be achieved with a mixin that validates the data ?

lovasoa avatar Jul 20 '21 20:07 lovasoa

Sure, that seems possible, but I think that's not the nicest way to do it. One pitfall I see coming up if the implementation of that mixin is done exactly as I originally suggested is that if you want to extend a class with only non-default values using a child class that also only has non-default values, you're going to be confused. Tacking on to my original example:

@dataclass
class ExampleChild(Example):
    bar: int

try:
    ExampleChild(foo=0, bar=0)
except ValidationError:
    print('good!')
else:
    print('bad!')

Creation of ExampleChild succeeds but we print 'bad!' because foo is allowed to be 0 still.

but, unintuitively:

@dataclass
class ExampleHackChild(ExampleHack):
    bar: int

try:
    ExampleHackChild(foo=0, bar=0)
except ValidationError:
    print('good!')
else:
    print('bad!')

raises TypeError: non-default argument 'bar' follows default argument on definition of ExampleHackChild due to the nature of the dataclass mro (this StackOverflow post has great discussion of this if you want to read more).

That's one reason why I'm thinking the implementation is best done more internal to the marshmallow_dataclass.dataclass implementation than my original suggestion.

sinback avatar Jul 20 '21 21:07 sinback