marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Context and instantiated Netsted schemas.

Open Thomas-Z opened this issue 3 years ago • 4 comments

I like to use schema context to transmit information from one schema to another (child or parent) and the following behavior seems a little strange to me.

import marshmallow as mrsh

class SchemaA(mrsh.Schema):
    a = mrsh.fields.Integer()

    @mrsh.pre_load
    def _pre_load(self, data, **_):
        self.context["from_a"] = "a"

        return data

class SchemaB1(mrsh.Schema):
    ac = mrsh.fields.Nested(SchemaA)

    @mrsh.pre_load
    def _pre_load(self, data, **_):
        self.context["from_b1"] = "b1"

        return data

class SchemaB2(mrsh.Schema):
    ac = mrsh.fields.Nested(SchemaA())

    @mrsh.pre_load
    def _pre_load(self, data, **_):
        self.context["from_b2"] = "b2"

        return data

The only difference between SchemaB1 and SchemaB2 is that I provide the SchemaA class to the first one and an instance of SchemaA to the other. This is the kind of things I usually do to provide a specific context to SchemaA when generated from SchemaB. Something like that : ac = mrsh.fields.Nested(SchemaA(context={"x": 1))

sb1 = SchemaB1()
sb2 = SchemaB2()

sb1.load({"ac": {"a": 5}})
sb2.load({"ac": {"a": 5}})

print(sb1.context)
>> {'from_b1': 'b1', 'from_a': 'a'}

print(sb2.context)
>> {'from_b2': 'b2'}

Using an instance of SchemaA result in the context not being fully shared between the Nested schema and its parent.

sb1.fields["ac"].schema.context is sb1.context
>> True

Same object if using a class.

sb2.fields["ac"].schema.context is sb2.context
>> False

Different objects if using an instance.

sb2.fields["ac"].schema.context
>> {'from_b2': 'b2', 'from_a': 'a'}

SchemaA correctly inherited from its parent context.

I'm not sure if that's a bug or not but that behavior somehow feels inconsistent to me (or at least hard to predict).

Is this the expected behavior ?

Thanks for your great library!

# Name                    Version                   Build  Channel
marshmallow               3.11.1             pyhd3eb1b0_0    defaults
python                    3.8.5                h7579374_1    defaults

Thomas-Z avatar Apr 11 '21 16:04 Thomas-Z

Thanks for reporting and sorry about the delay.

Indeed there is an inconsistency. See related discussion in #1617 and an implementation proposal for marshmallow 4 in #1826.

What would you expect?

  • {'from_b2': 'b2', 'from_a': 'a'}: parent updates nested schema context
  • {'from_a': 'a'} : parent overrides nested schema context

The former brings thread-safety issues, so I'm willing to chose the latter. This is what I do in #1833.

The update case is something I didn't think about when proposing #1826.

lafrech avatar Jun 22 '21 08:06 lafrech

I'm not sure to fully understand your proposal and its implications but here is what I understand/expect.

I don't know what the original intent was when designing the "context" attribute but I use it a lot to provide different contexts to nested schemas:

import marshmallow as mm

DO_SOMETHING = "DO_SOMETHING"


class SchemaA(mm.Schema):
    a = mm.fields.Integer(required=True)

    @mm.post_load
    def _post_load(self, data, **_):
        do_smt = self.context.get(DO_SOMETHING, False)

        if do_smt:
            data["a"] = data["a"] * 4

        return data


class SchemaB(mm.Schema):
    ac1 = mm.fields.Nested(SchemaA(context={DO_SOMETHING: True}))
    ac2 = mm.fields.Nested(SchemaA(context={DO_SOMETHING: False}))
sch = SchemaB()

data = {"ac1": {"a": 1}, "ac2": {"a": 1}}

sch.load(data)
>> {'ac1': {'a': 4}, 'ac2': {'a': 1}}
sch = SchemaB(context={DO_SOMETHING: True})

data = {"ac1": {"a": 1}, "ac2": {"a": 1}}

sch.load(data)
>> {'ac1': {'a': 4}, 'ac2': {'a': 4}}

Both of these results seem consistent to me and are what I would expect. I think your proposal is breaking this kind of things.

In my initial example, it was not about parents updating nested schema's context but nested schema updating their parents' context. This behavior seems dangerous and hardly predictable.

The way I see it, each nested schema's context should inherit from its parent's context by making a copy of it (but not sharing it nor having the possibility to update it) and add its own context elements the way it's done today.

Thomas-Z avatar Jun 26 '21 16:06 Thomas-Z

I don't know the original intent for this feature either. I've never used it.

To me, the point is to allow modifying the behaviour at runtime.

Your sample code uses it to modify the schema at init (import) time. This could easily be achieved by just subclassing Schema.

class SchemaA(mm.Schema):
    def __init__(self, do_smt=False, *args, **kwargs):
        self.do_smt = do_smt

    a = mm.fields.Integer(required=True)

    @mm.post_load
    def _post_load(self, data, **_):
        if self.do_smt:
            data["a"] = data["a"] * 4
        return data

class SchemaB(mm.Schema):
    ac1 = mm.fields.Nested(SchemaA(do_smt=True}))
    ac2 = mm.fields.Nested(SchemaA())

Is this correct? Or was your example too contrived and it wouldn't work in your real use case?

The rework I propose in #1826 indeed breaks your use case but I believe this is fine as you should be able to achieve what you want like I'm showing above and context would be used only to modify things at runtime, not init.


The question that remains is what to do about marshmallow 3. The fix in #1833 also breaks your use case. But it fixes a thread-safety issue that can catch users by surprise. We could argue that the way you're using it is not the way it is documented in the docs but to be fair, the doc for this feature is a bit light. We could also argue that the breaking change is likely to be caught in unit tests while the thread-safety issue is silent. We could merge the fix raise a warning when initiating a schema with a context in a Nested field. But I still don't like the idea of a breaking change in 3.x.

Or we go the other way, don't merge the fix but raise a warning to alert about the thread-safety issue. No breaking change in marshmallow 3.x. Users will have to use a schema class rather than an instance to avoid the thread-safety issue, which means they won't be able to use schema modifiers while passing a context. We could live with this corner case limitation until 4.0.

@sloria @deckar01, thoughts?

lafrech avatar Jul 21 '21 21:07 lafrech

This is correct and looks cleaner than the way I'm doing it.

I think (not 100% sure) this would allow me to not use contexts this way anymore and not be too much affected by the #1833 fix.

Changing your code to this

import marshmallow as mm

DO_SOMETHING = "DO_SOMETHING"

class SchemaA(mm.Schema):
    def __init__(self, do_smt=False, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.do_smt = do_smt

    a = mm.fields.Integer(required=True)

    @mm.post_load
    def _post_load(self, data, **_):
        do_smt = self.context.get(DO_SOMETHING, self.do_smt)

        if do_smt:
            data["a"] = data["a"] * 4

        return data


class SchemaB(mm.Schema):
    ac1 = mm.fields.Nested(SchemaA(do_smt=True))
    ac2 = mm.fields.Nested(SchemaA())

Would still allow the following usage:

sch = SchemaB(context={DO_SOMETHING: True})

data = {"ac1": {"a": 1}, "ac2": {"a": 1}}

sch.load(data)
>> {'ac1': {'a': 4}, 'ac2': {'a': 4}}

#1833 would also fix the inconsistency I mentioned in my initial message.

Like you, I'm not sure a breaking change in 3.x would be a good thing (except if I'm the only one using contexts like this, I wouldn't mind).

Thomas-Z avatar Aug 01 '21 17:08 Thomas-Z