marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Propagate `unknown` when an explicit flag is set, in the form `unknown=...|PROPAGATE`

Open sirosen opened this issue 3 years ago • 10 comments

This is based upon the work in #1490 , but with some significant modifications. In retrospect, it might have been simpler to base it on #1429 , or just do the work from scratch, but I'm not sure. closes #1428, #1429, #1490, and #1575

The biggest change from #1490 or #1429 I've made is that the behavior is never implied by certain usages -- it's always explicit. There's no distinction between unknown being passed at load time, to Nested, at schema instantiation, or via schema options. Instead, all of these contexts will support propagate_unknown=True, which turns on propagation. This approach might not be all that elegant -- it somewhat bloats the public interface -- but it is guaranteed to be backwards compatible. Because the behavior is opt-in, nobody should have behavior change on them unexpectedly with this update.

Quick note: I wasn't sure if it is desirable to allow propagate_unknown in all of these contexts, especially Nested. I included it for now, but would be open to removing it.

propagate_unknown as written does not respect setting propagate_unknown=False in a nested schema. It's not something I considered particularly important (since the whole point is to "turn the feature on") but I can look at tweaking that if it's likely to be confusing.

One important bit of this is that I wanted propagate_unknown to understand the difference between unknown=RAISE being set because it's the default, and someone writing fields.Nested(MySchema(unknown=RAISE)), in which case they're asking for it explicitly. To handle this, __init__ checks before setting unknown. If no value was passed, in addition to setting unknown=RAISE, it will set a new value, auto_unknown=True. During loading, whenever propagate_unknown=True and auto_unknown=False, the unknown value for the nested schema in question will be respected. Furthermore, that value will start to propagate. The same happens, though without use of auto_unknown, wherever fields.Nested.unknown is set.

Reading this comment on #1429, my understanding was that handling these kinds of scenarios was considered to be necessary. With the auto_unknown tracking and behavior, it's possible to support cases like this one:

class D(Schema):
    x = fields.Str()
class C(Schema):
    x = fields.Str()
class B(Schema):
    d = fields.Nested(D(unknown=EXCLUDE))
    c = fields.Nested(C)
class A(Schema):
    b1 = fields.Nested(B(unknown=INCLUDE))
    b2 = fields.Nested(B())

A(propagate_unknown=True, unknown=RAISE).load(...)

In the above, everything should "percolate down" in a relatively intuitive way. Importantly, that top-level value for unknown=RAISE doesn't override the use of unknown=INCLUDE and unknown=EXCLUDE in descendant schemas.

I've added a few simple tests, but I haven't aimed to be completely comprehensive because there are too many scenarios to think about. It's hard to tell what's important to test this way and what is not. If there's something I ought to be testing which is missing, just say that word and I'm happy to add more test cases.


This is my first time touching marshmallow, and I'm changing the interfaces for Schema, fields.Nested, and schema opts. So I would not be at all surprised if I've missed something significant -- please just let me know if so.

sirosen avatar Jul 17 '20 05:07 sirosen

@sloria, could I push for your (or another marshmallow maintainer's) feedback on this approach? The implementation aside, I'm not sure if this is the way to solve this problem.

sirosen avatar Aug 11 '20 22:08 sirosen

Apologies for the delay. Unfortunately I won't be able to give this the focused attention this needs in the next week or so--lots of work and non-work commitments coming up fast.

I thought we'd decided against this behavior in past proposals:

An instance method argument is not the only way to define schema meta options in bulk. Practical solutions were linked to in #1490. Maximizing convenience does not seem to be a strong reason to bake this functionality into the API since the work arounds are simple and the use case is obscure.

https://github.com/marshmallow-code/marshmallow/pull/1575#issuecomment-669643931

I think this proposed change would make it too easy for developers to implicitly opt into undesired behavior. I don't think there is any precedent for method args behaving this much differently than their Meta counterparts, which would make it unexpected behavior for most developers.

https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299

And potential solutions requiring no changes to marshmallow:

  • Utility function to set unknown recursively: https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299
  • Overriding the default Schema.Meta.unknown: https://github.com/marshmallow-code/marshmallow/issues/1367#issuecomment-524988482

sloria avatar Aug 12 '20 14:08 sloria

This does work around some of the issues with past proposals, but it illustrates how much API surface area has to be added to get this behavior. It takes significantly less code to use a factory pattern, and it covers more complex use cases.

def get_schema(unknown=None):
   class D(Schema):
       x = fields.Str()
   class C(Schema):
       x = fields.Str()
   class B(Schema):
       d = fields.Nested(D(unknown=unknown or EXCLUDE))
       c = fields.Nested(C)
   class A(Schema):
       b1 = fields.Nested(B(unknown=unknown or INCLUDE))
       b2 = fields.Nested(B())
  return A

A = get_schema(unknown=RAISE)
A().load(...)

deckar01 avatar Aug 12 '20 16:08 deckar01

The factory pattern or a custom base schema probably covers all usage. I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE or INCLUDE on a whole tree, they specify schema.load(unknown=...), and then they are surprised by the result not being what they wanted.

I thought we'd decided against this behavior in past proposals

Not having participated in the past threads, the fact that other PRs for this have been left open led me to believe the door was being intentionally left open to a better approach.

Even that aside, users are somewhat consistently surprised by the fact that unknown doesn't automatically propagate down. IMO, it merits a second look (at least, since #980) on that account alone.

it illustrates how much API surface area has to be added to get this behavior.

I take your comment to also mean "changing the API this much makes things worse / more confusing". Which is something with which I agree.

I can work to smooth out the public API by making the values for unknown into specialized objects. Usage like unknown=EXCLUDE | PROPAGATE would be nice, from a user-perspective. The API doesn't change at all that way (if we consider repr(EXCLUDE) to be non-public, which I would). I didn't bother originally because it's more work to support, but I think it provides a better API.

sample of an or-able UnknownParam class
class UnknownParam:
    def __init__(self, value=None, propagate=False):
        self.value = value
        self.propagate = propagate

    def __or__(self, other):
        return UnknownParam(
            self.value or other.value, self.propagate or other.propagate
        )

    def __str__(self):
        if self.value:
            return self.value
        if self.propagate:
            return "propagate"
        return "null"

    def __repr__(self):
        return "UnknownParam(value={!r}, propagate={!r})".format(
            self.value, self.propagate
        )

    def is_good(self):
        return self.value is not None


EXCLUDE = UnknownParam("exclude")
INCLUDE = UnknownParam("include")
RAISE = UnknownParam("raise")
PROPAGATE = UnknownParam(propagate=True)

I'm happy to rework this existing PR to use that, if it would stand a chance at being accepted.

As far as I'm concerned, that's just sugar -- which I'm happy to work on, since the public API is important! -- but I'd just be doing the same thing internally. I don't want to do that work unless we agree that allowing users to opt-in to propagation like this is desirable in the first place.


If no functional change is going to be made, I think something explicit needs to be added somewhere in the docs. I would be inclined to phrase it positively, in terms of how to apply unknown=EXCLUDE or unknown=INCLUDE to a tree of schemas. For example, I would be happy to add to the relevant quickstart section in a new PR.

suggested addition to quickstart docs
Setting `unknown` for all schemas in a tree
-----

Note that the `unknown` setting does not automatically propagate to nested schemas.
To set `unknown` for an entire schema tree, use a shared base class.

For example:

class ExcludingSchema(Schema):
    class Meta:
        unknown = EXCLUDE

class C(ExcludingSchema):
    x = fields.Str()
class B(ExcludingSchema):
    c = fields.Nested(C)
class A(ExcludingSchema):
    b = fields.Nested(B)

A().load({"b": {"c": {"x": "foo", "y": "bar"}, "d": "baz"}})
# => {"b": {"c": {"x": "foo"}}}

sirosen avatar Aug 12 '20 23:08 sirosen

I take your comment to also mean "changing the API this much makes things worse / more confusing".

To clarify: I think the number of method arguments added is disproportionate to the benefit it provides. I generally promote providing all the data needed to satisfy special use cases rather than implementing them in the core. A leaner core is better for performance and reduces the complexity of maintaining and contributing to the library.

I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE or INCLUDE on a whole tree

I think this can be done without modifying the core. If you are open to solving this use case with a utility and think a convenient interface for this functionality would be useful to other users in the community, would you consider publishing it as a package? I think that would be a good addition to the "Fields and Custom Schemas" section of the Ecosystem page of the wiki.

deckar01 avatar Aug 13 '20 19:08 deckar01

It's possible to do this in a separate lib, but if it's not going into the core, then the solutions of using a custom base schema or writing a schema builder are much better.

I think I misunderstood the situation -- with the existing PRs and issues still open, I thought this was something where I could close out a series of issues. It seems like none of these are actually wanted. Can they all be closed, along with this?

If it's useful, I can open a new PR with docs as I suggested. That way, the unknown documentation will state, up-front, how to solve this use-case. And that should help anyone being surprised by the behavior.

sirosen avatar Aug 20 '20 14:08 sirosen

I don't want to nag about this issue, but I also feel that the first draft of this put it on poor footing by adding a lot more API surface and including an inessential feature in the auto_unknown behavior.

I've refactored it to add PROPAGATE with the bitwise-or based interface proposed above, and to remove the auto_unknown behavior entirely in order to simplify. I'm not sure if unknown="raise" is considered valid and worth supporting or not, so I included support for it.

I'm still not strongly attached to making the change, but I think this is the simplest/smallest version of it I can imagine which won't break any normal usages today.

sirosen avatar Sep 04 '20 20:09 sirosen

@wanderrful, you commented while I was doing a refactor to make this (hopefully) worthy of consideration. But my intent here is to provide propagation as an option which you can opt-in to. Not making it opt-in would be badly backwards-incompatible.

If you want to try this branch, the current usage is to set unknown=EXCLUDE | PROPAGATE. However, based on your comment in #1490 , you may want to try using this workaround instead: https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299

sirosen avatar Sep 04 '20 20:09 sirosen

Completely agreed, sounds good! +1 will greatly appreciate having Propagate as an option.

wanderrful avatar Sep 04 '20 20:09 wanderrful

For the people looking how to solve the issue, this helps me:

"user": fields.Nested(                  
    {"id": fields.Int(required=True),   
     "login": fields.Str(required=True),
     "url": fields.Str()                
    },                                  
    required=True,                      
    unknown=EXCLUDE),                   

Adding unknown=EXCLUDE inside the fields.Nested to make sure that the Nested field ignore other data coming from the request.

caio-vinicius avatar Apr 04 '22 01:04 caio-vinicius