marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Object to pass to Nested as missing

Open cancan101 opened this issue 5 years ago • 12 comments

In deserialization, It seems like the missing value is what is returned if there is no value as opposed to the value that gets passed to the nested Schema. Is there anyway to have "default" value for the nested schema? e.g:

class A(Schema):
    a = fields.String()
    
class B(Schema):
    b = fields.Nested(A, many=True, missing=[])
    
class C(Schema):
    c = fields.Nested(B, missing={})    

C().load({}) # returns {'c': {}}

I would like it to return {'c': {'b': []}}, i.e have it run the nested schema passing an empty object and then letting the nested schema apply its own missing logic.

cancan101 avatar Nov 07 '18 00:11 cancan101

This could be a downside of https://github.com/marshmallow-code/marshmallow/issues/378 / https://github.com/marshmallow-code/marshmallow/pull/756.

I didn't try, but AFAIU when default value was passed in serialized form, it was deserialized, so in this case the Nested field would deserialize {} and apply its default values.

Since the change in #756, the deserialization does not happen.

I understand the need, but I don't see a simple way to get around this.

lafrech avatar Nov 07 '18 08:11 lafrech

Any reason not to change missing such that it get's passed to the respective fields rather than being returned by them? Or add a new missing concept that does that?

cancan101 avatar Nov 07 '18 14:11 cancan101

Any reason not to change missing such that it get's passed to the respective fields rather than being returned by them?

AFAIU, "passed to the respective fields" means "passed as serialized data to deserialize", which is the old behaviour, before #756. Isn't it?

BTW, I suppose we had the symmetric issue with default before that change. It would be interesting to check.

Or add a new missing concept that does that?

A reason not to do it could be "complex API". Unless we find an elegant way to do that without complexifying things too much.

lafrech avatar Nov 07 '18 15:11 lafrech

I'm really sad that this happened. I was using this a lot to initialize nested fields. Is there really no work around to get this to work again?

edited to add: I do think this is a very big change in behaviour, and it would be good to add/explain this consequence of the #756 to the changelog.

RosanneZe avatar Dec 04 '18 08:12 RosanneZe

This is odd. I would imagine that setting defaults on nested fields is not an uncommon use case.

Kareeeeem avatar Dec 04 '18 10:12 Kareeeeem

Here is a quick and dirty work around:

class C(Schema):
    c = fields.Nested(B, missing=B().load({}))

If a callable value for missing had a way to access the nested schema instance this would be a more robust option.

class C(Schema):
    c = fields.Nested(B, missing=lambda self: self.schema.load({}))

deckar01 avatar Dec 04 '18 15:12 deckar01

This solution works most of the time, but it doesn't propagate the context. Here's a solution @Kareeeeem and I came up with:

class A(Schema):
    x = fields.String(missing='x')
    y = fields.String(missing='y')
    z = fields.String()

class B(Schema):
    a = fields.Nested(A, missing=dict)
    b = fields.Nested(A, missing=lambda: {'y': 'not y'})
    c = fields.Nested(A)
    @pre_load
    def load_missing_nested(self, data):
        for fieldname, field in self.fields.items():
            if (fieldname not in data and isinstance(field, fields.Nested) and
                callable(field.missing)):
            data[fieldname] = field.schema.load(field.missing())
        return data
B().load({})
{
    'a': {'x': 'x', 'y': 'y'},
    'b': {'x': 'x', 'y': 'not y'},
}

RosanneZe avatar Feb 01 '19 09:02 RosanneZe

BTW, I suppose we had the symmetric issue with default before that change. It would be interesting to check.

Confirmed.

This code, before #756 (I actually tried on 2.x-line), prints {'c': {}}

from marshmallow import Schema, fields

class A(Schema):
    a = fields.String()

class B(Schema):
    b = fields.Nested(A, many=True, default=[])

class C(Schema):
    c = fields.Nested(B, default={})

print(C().dump({}))

So the issue is not new. It just impacts missing while it used to impact default. I guess default is used less often, but my point is that even reverting #756 wouldn't really be a satisfying answer.

lafrech avatar Jun 04 '19 14:06 lafrech

This issue looks like it'll need deeper investigation, but I'd really hate to delay 3.0 any further. Since there are existing workarounds posted above, how do we feel about deferring this for post-3.0.0? @lafrech @deckar01

sloria avatar Jul 07 '19 18:07 sloria

In the end we had to go with a different workaround than the one above, I forgot why exactly. The most practical thing was just to create our own Nested field and use that everywhere:

class Nested(fields.Nested):
    """
    Field that will fill in nested before loading so nested missing fields will
    be initialized.
    """
    def deserialize(self, value, attr=None, data=None, **kwargs):
        self._validate_missing(value)
        if value is missing_:
            _miss = self.missing
            value = _miss() if callable(_miss) else _miss
        return super().deserialize(value, attr, data, **kwargs)

RosanneZe avatar Jul 08 '19 08:07 RosanneZe

Due to the former (MA2) behaviour, when people use {} as nested_field.missing value, they may not mean "{}" but "this is an empty value, please give me the default load value of the schema, that is the value the schema outputs when loading {}".

This is unfortunately not the new semantics for missing. The default value should be returned as is, therefore {} means {}.

OTOH, there seems to be a demand case for a feature allowing to default to the schema default load value.

We could add another parameter to allow passing a missing value in serialized form. I'd rather avoid that double API exposure but if it's just a shortcut for the workaround above, it could be nice to provide it. At least, it is worth investigating. That would be a non-breaking change.

There may be other ways to provide this in a non-breaking manner.

Note that default now acts like missing used to, so when passing {} as default, Nested dumps the default schema dump (including field defaults), not an empty schema. That's consistent with the fact that the value is expressed in object form, not serialized form. There is no way to specify an empty default if the schema has field defaults.

Overall, I don't object to postponing this to 3.x. But I agree this use case is legit and it would be a nice feature to have. And I understand it can be seen as a regression when coming from MA2.

lafrech avatar Jul 17 '19 18:07 lafrech

This solution works most of the time, but it doesn't propagate the context. Here's a solution @Kareeeeem and I came up with:

class A(Schema):
    x = fields.String(missing='x')
    y = fields.String(missing='y')
    z = fields.String()

class B(Schema):
    a = fields.Nested(A, missing=dict)
    b = fields.Nested(A, missing=lambda: {'y': 'not y'})
    c = fields.Nested(A)
    @pre_load
    def load_missing_nested(self, data):
        for fieldname, field in self.fields.items():
            if (fieldname not in data and isinstance(field, fields.Nested) and
                callable(field.missing)):
            data[fieldname] = field.schema.load(field.missing())
        return data
B().load({})
{
    'a': {'x': 'x', 'y': 'y'},
    'b': {'x': 'x', 'y': 'not y'},
}

Following @RosanneZe approach seems to work in general but initializing a DateTime field somehow results in a weird behavior. I tried to create a small example here:

from marshmallow import Schema, fields, pre_load
import datetime

class NestedDateSchema(Schema):
    date_time = fields.DateTime(missing=lambda: datetime.datetime.now().isoformat())
    operator = fields.String(missing=">=")

class ParentSchema(Schema):
    name = fields.String(missing="TEST")
    time = fields.Nested(NestedDateSchema, missing=dict)

    @pre_load
    def load_missing_nested(self, data, **kwargs):
        for fieldname, field in self.fields.items():
            if (fieldname not in data and isinstance(field, fields.Nested) and callable(field.missing)):
                data[fieldname] = field.schema.load(field.missing())
        return data

# [...]

ParentSchema().load({'name': 'case1 - date_time="2022-03-01T11:05:52.142158"', 'time': {'operator': '<='}})
# date_time looks like a string

# vs

ParentSchema().load({'name': 'case2 - date_time=datetime.datetime(2022, 3, 1, 11, 28, 35, 80158)'})
# date_time is a datetime instance

Not using the lambda expression seems to prevent this weird behavior but results in always the same time from first import. Any ideas how this could be solved?

In my actual use case I get the following error: "'str' object has no attribute 'isoformat'" which may be related to the shown circumstances.

JimmyPesto avatar Mar 01 '22 11:03 JimmyPesto