marshmallow
marshmallow copied to clipboard
Propagate `unknown` when an explicit flag is set, in the form `unknown=...|PROPAGATE`
This is based upon the work in #1490 , but with some significant modifications. In retrospect, it might have been simpler to base it on #1429 , or just do the work from scratch, but I'm not sure. closes #1428, #1429, #1490, and #1575
The biggest change from #1490 or #1429 I've made is that the behavior is never implied by certain usages -- it's always explicit. There's no distinction between unknown
being passed at load time, to Nested
, at schema instantiation, or via schema options. Instead, all of these contexts will support propagate_unknown=True
, which turns on propagation.
This approach might not be all that elegant -- it somewhat bloats the public interface -- but it is guaranteed to be backwards compatible. Because the behavior is opt-in, nobody should have behavior change on them unexpectedly with this update.
Quick note: I wasn't sure if it is desirable to allow propagate_unknown
in all of these contexts, especially Nested
. I included it for now, but would be open to removing it.
propagate_unknown
as written does not respect setting propagate_unknown=False
in a nested schema. It's not something I considered particularly important (since the whole point is to "turn the feature on") but I can look at tweaking that if it's likely to be confusing.
One important bit of this is that I wanted propagate_unknown
to understand the difference between unknown=RAISE
being set because it's the default, and someone writing fields.Nested(MySchema(unknown=RAISE))
, in which case they're asking for it explicitly. To handle this, __init__
checks before setting unknown
. If no value was passed, in addition to setting unknown=RAISE
, it will set a new value, auto_unknown=True
.
During loading, whenever propagate_unknown=True
and auto_unknown=False
, the unknown
value for the nested schema in question will be respected. Furthermore, that value will start to propagate. The same happens, though without use of auto_unknown
, wherever fields.Nested.unknown
is set.
Reading this comment on #1429, my understanding was that handling these kinds of scenarios was considered to be necessary. With the auto_unknown
tracking and behavior, it's possible to support cases like this one:
class D(Schema):
x = fields.Str()
class C(Schema):
x = fields.Str()
class B(Schema):
d = fields.Nested(D(unknown=EXCLUDE))
c = fields.Nested(C)
class A(Schema):
b1 = fields.Nested(B(unknown=INCLUDE))
b2 = fields.Nested(B())
A(propagate_unknown=True, unknown=RAISE).load(...)
In the above, everything should "percolate down" in a relatively intuitive way. Importantly, that top-level value for unknown=RAISE
doesn't override the use of unknown=INCLUDE
and unknown=EXCLUDE
in descendant schemas.
I've added a few simple tests, but I haven't aimed to be completely comprehensive because there are too many scenarios to think about. It's hard to tell what's important to test this way and what is not. If there's something I ought to be testing which is missing, just say that word and I'm happy to add more test cases.
This is my first time touching marshmallow
, and I'm changing the interfaces for Schema
, fields.Nested
, and schema opts. So I would not be at all surprised if I've missed something significant -- please just let me know if so.
@sloria, could I push for your (or another marshmallow maintainer's) feedback on this approach? The implementation aside, I'm not sure if this is the way to solve this problem.
Apologies for the delay. Unfortunately I won't be able to give this the focused attention this needs in the next week or so--lots of work and non-work commitments coming up fast.
I thought we'd decided against this behavior in past proposals:
An instance method argument is not the only way to define schema meta options in bulk. Practical solutions were linked to in #1490. Maximizing convenience does not seem to be a strong reason to bake this functionality into the API since the work arounds are simple and the use case is obscure.
https://github.com/marshmallow-code/marshmallow/pull/1575#issuecomment-669643931
I think this proposed change would make it too easy for developers to implicitly opt into undesired behavior. I don't think there is any precedent for method args behaving this much differently than their Meta counterparts, which would make it unexpected behavior for most developers.
https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299
And potential solutions requiring no changes to marshmallow:
- Utility function to set
unknown
recursively: https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299 - Overriding the default
Schema.Meta.unknown
: https://github.com/marshmallow-code/marshmallow/issues/1367#issuecomment-524988482
This does work around some of the issues with past proposals, but it illustrates how much API surface area has to be added to get this behavior. It takes significantly less code to use a factory pattern, and it covers more complex use cases.
def get_schema(unknown=None):
class D(Schema):
x = fields.Str()
class C(Schema):
x = fields.Str()
class B(Schema):
d = fields.Nested(D(unknown=unknown or EXCLUDE))
c = fields.Nested(C)
class A(Schema):
b1 = fields.Nested(B(unknown=unknown or INCLUDE))
b2 = fields.Nested(B())
return A
A = get_schema(unknown=RAISE)
A().load(...)
The factory pattern or a custom base schema probably covers all usage. I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE
or INCLUDE
on a whole tree, they specify schema.load(unknown=...)
, and then they are surprised by the result not being what they wanted.
I thought we'd decided against this behavior in past proposals
Not having participated in the past threads, the fact that other PRs for this have been left open led me to believe the door was being intentionally left open to a better approach.
Even that aside, users are somewhat consistently surprised by the fact that unknown
doesn't automatically propagate down. IMO, it merits a second look (at least, since #980) on that account alone.
it illustrates how much API surface area has to be added to get this behavior.
I take your comment to also mean "changing the API this much makes things worse / more confusing". Which is something with which I agree.
I can work to smooth out the public API by making the values for unknown
into specialized objects.
Usage like unknown=EXCLUDE | PROPAGATE
would be nice, from a user-perspective. The API doesn't change at all that way (if we consider repr(EXCLUDE)
to be non-public, which I would). I didn't bother originally because it's more work to support, but I think it provides a better API.
sample of an or-able UnknownParam class
class UnknownParam:
def __init__(self, value=None, propagate=False):
self.value = value
self.propagate = propagate
def __or__(self, other):
return UnknownParam(
self.value or other.value, self.propagate or other.propagate
)
def __str__(self):
if self.value:
return self.value
if self.propagate:
return "propagate"
return "null"
def __repr__(self):
return "UnknownParam(value={!r}, propagate={!r})".format(
self.value, self.propagate
)
def is_good(self):
return self.value is not None
EXCLUDE = UnknownParam("exclude")
INCLUDE = UnknownParam("include")
RAISE = UnknownParam("raise")
PROPAGATE = UnknownParam(propagate=True)
I'm happy to rework this existing PR to use that, if it would stand a chance at being accepted.
As far as I'm concerned, that's just sugar -- which I'm happy to work on, since the public API is important! -- but I'd just be doing the same thing internally. I don't want to do that work unless we agree that allowing users to opt-in to propagation like this is desirable in the first place.
If no functional change is going to be made, I think something explicit needs to be added somewhere in the docs. I would be inclined to phrase it positively, in terms of how to apply unknown=EXCLUDE
or unknown=INCLUDE
to a tree of schemas.
For example, I would be happy to add to the relevant quickstart section in a new PR.
suggested addition to quickstart docs
Setting `unknown` for all schemas in a tree
-----
Note that the `unknown` setting does not automatically propagate to nested schemas.
To set `unknown` for an entire schema tree, use a shared base class.
For example:
class ExcludingSchema(Schema):
class Meta:
unknown = EXCLUDE
class C(ExcludingSchema):
x = fields.Str()
class B(ExcludingSchema):
c = fields.Nested(C)
class A(ExcludingSchema):
b = fields.Nested(B)
A().load({"b": {"c": {"x": "foo", "y": "bar"}, "d": "baz"}})
# => {"b": {"c": {"x": "foo"}}}
I take your comment to also mean "changing the API this much makes things worse / more confusing".
To clarify: I think the number of method arguments added is disproportionate to the benefit it provides. I generally promote providing all the data needed to satisfy special use cases rather than implementing them in the core. A leaner core is better for performance and reduces the complexity of maintaining and contributing to the library.
I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE or INCLUDE on a whole tree
I think this can be done without modifying the core. If you are open to solving this use case with a utility and think a convenient interface for this functionality would be useful to other users in the community, would you consider publishing it as a package? I think that would be a good addition to the "Fields and Custom Schemas" section of the Ecosystem page of the wiki.
It's possible to do this in a separate lib, but if it's not going into the core, then the solutions of using a custom base schema or writing a schema builder are much better.
I think I misunderstood the situation -- with the existing PRs and issues still open, I thought this was something where I could close out a series of issues. It seems like none of these are actually wanted. Can they all be closed, along with this?
If it's useful, I can open a new PR with docs as I suggested. That way, the unknown
documentation will state, up-front, how to solve this use-case. And that should help anyone being surprised by the behavior.
I don't want to nag about this issue, but I also feel that the first draft of this put it on poor footing by adding a lot more API surface and including an inessential feature in the auto_unknown
behavior.
I've refactored it to add PROPAGATE
with the bitwise-or based interface proposed above, and to remove the auto_unknown
behavior entirely in order to simplify. I'm not sure if unknown="raise"
is considered valid and worth supporting or not, so I included support for it.
I'm still not strongly attached to making the change, but I think this is the simplest/smallest version of it I can imagine which won't break any normal usages today.
@wanderrful, you commented while I was doing a refactor to make this (hopefully) worthy of consideration. But my intent here is to provide propagation as an option which you can opt-in to. Not making it opt-in would be badly backwards-incompatible.
If you want to try this branch, the current usage is to set unknown=EXCLUDE | PROPAGATE
.
However, based on your comment in #1490 , you may want to try using this workaround instead: https://github.com/marshmallow-code/marshmallow/issues/1428#issuecomment-558297299
Completely agreed, sounds good! +1 will greatly appreciate having Propagate as an option.
For the people looking how to solve the issue, this helps me:
"user": fields.Nested(
{"id": fields.Int(required=True),
"login": fields.Str(required=True),
"url": fields.Str()
},
required=True,
unknown=EXCLUDE),
Adding unknown=EXCLUDE inside the fields.Nested to make sure that the Nested field ignore other data coming from the request.