marshmallow
marshmallow copied to clipboard
Add priority to decorator processors
In the documentation, it's quite explicit about there being no prioritization of processors. I believe this would be a great addition for a few reasons. It allows more complex post-processing flows.
For example, the case I need priority
is where an abstract schema will post-process into a model object, but before that must be some specialized post-processing in an implementation of the schema. This is less than ideal to implement using the suggested pattern due to the post-processor belonging to the abstract class.
This will retain backward compatibility, given that every processor will default to the same priority and the disclaimer in the documentation.
I'm open to suggestions for an alternative interface for this. Defaulting all processors to 0
would require other processors to run before them to have negative values, so likely a non-zero priority is more appropriate.
ps. love marshmallow, and all of the work that y'all do for it! The documentation is among the best that I've experienced, and every time I read through I discover a new solution to an anti-pattern I've had :).
Thanks for the PR and the kind words, @xLegoz . Looks potentially useful and the implementation is straightforward. It seems, however, that it would be straightforward to meet your use-case using a BaseSchema that exposes hooks, like so:
from marshmallow import Schema, post_load
class BaseSchema(Schema):
@post_load
def post_load(self, data):
postprocessed = self._postprocess(data)
return self._make_object(postprocessed)
def _postprocess(self, data):
return data
def _make_object(self, data):
return data
Would this meet your use case?
Something like that does work out, but with quite a bit of hacking for the project I have, and we've got a lot (probably too many) post-processors we'd need to change for it. At the moment, this has been the best solution and the project is using the forked version so it's not a big deal :).
Sounds good. I'm not opposed to merging this, but I'd like to take some time to collect my thoughts and get feedback on this.
For one, I think it might make sense for higher-priority methods to execute first, i.e. execute in descending priority order. This seems more intuitive, and also addresses the issue of having 0
as the default priority, as you mentioned in your original post.
Interesting idea, @xLegoz!
I agree that it would be more intuitive for higher priority methods to execute first. This just makes sense in English. "High priority" is more urgent than "low priority".
However, I lean toward @sloria's view that the same behavior could be accomplished by writing one @post_load handler that calls several other functions. Where possible, this would be easier to follow than scattered methods executed in order of priority decorators.
You could even define your own @post_load_priority(priority) decorator that would register any number of methods on the Schema as (for example) self._post_load_methods. Your @post_load function could then call all those methods in sorted order, without needing to know all the methods when the @post_load method is defined. Subclasses could define their own methods and specify their own post_load_priority.
Another thought is, is there any way in your design to provide named hooks rather than numeric priorities? If you have a situation where subclasses are inserting processing steps, I wonder if it'd be easier to read and maintain if you define specific steps that subclasses could hook into (which may be what @sloria meant).
This is useful, however, when subclassing other schemas that may already have some processors defined on it (say, some post_load
). There are some ugly hacks to get around that, but it would be nice if, given the priorities of existing processors (assuming their priority levels are reasonably well spaced out) that a subclass can add additional processors in the same category in a reasonable order with respect to existing processors.
An alternative to this might be to add something like before=
or after=
arguments to a processor tag, where an existing processor is given. Then, rather than relying on admittedly shaky priority-levels one could explicitly order processors with respect to each other.
@embray I do agree with your reasoning for the alternative approach with before=
and after=
. As it's already been mentioned, defining an ordering with numbers is not very intuitive.
There is some ambiguity about how you might specify the functions, so there definitely needs to be some thought there for the most approachable interface. e.g. a few different ways the interface could work could be:
-
@post_processor(before=ThisClass.processor_that_runs_after)
-
@post_processor(before="processor_that_runs_after")
(Of course, if this could functionally be provided as a library that would also be pretty wicked for an experimental interface!)
Yeah, I might give that a shot. There are definitely areas where one can run into trouble with this (e.g. cycles). Normally that wouldn't happen but it would probably be good to check.
Greetings. I really think that latter method is too complicated for such task. Pros - it's more explicit. Cons - more work to support and implement (cycles that mentioned by @embray, for example), if processor you're depending on is renamed or not exist you should throw an error? And what if processor is applied dynamically on schema instance creation? You need priorities (or order) only for schema sub-classing and mixins. It's really not such a big job to determine integer for it (and you need it in your own code, in most usecases). But on using "after/before" it would not be obvious when two processors without "after/before" is executed (at least I can't get straight picture right now). So in the end - we would get more explicit(+) but more complicated(-) behavior. About third-party libraries with several processors - in such rare cases - they may use "order" with step 10 to allow user-code insert their processors when needed (for example popper.js use step 50 to allow user-code insert "modifiers"). So my vote - to merge this pull request as is (maybe rename "priority" to "order"). Thank you guys for all your work!
The more we answer various use cases with
this is a job for [pre|post]_[dump|load] decorators
the more this is needed.
I'll mark it as 3.0. Feel free to postpone if you think it is not so crucial.
I also agree that using integers has a theoretical limitation that is not a real issue in practice. Python logging module also uses integer logging levels, and defines default levels only for tens, leaving a lot enough space to define custom levels.
At least, integer priorities would be better than any twisted workaround to enforce the order of validators from different code layers or even different libraries.
Setting 3.0 milestone on this so as not to forget it as it seems feasible. Guys, feel free to postpone.
Try overriding the abstract hook with the specialized hook and explicitly declare the dependency order by choosing where to call super.
from marshmallow import Schema, fields, post_load
class BaseModel:
def __init__(self, data):
self.data = data
class Base(Schema):
id = fields.Str()
@post_load
def make_model(self, data):
return BaseModel(data)
class Special(Base):
name = fields.Str()
@post_load
def make_model(self, data):
data['name'] = data['name'].title()
return super().make_model(data)
result = Special().load(dict(id='1', name='test'))
print(result, result.data)
Note: I couldn't actually reproduce the reported issue when building this minimal example. It just demonstrates that inherited hooks can can control the execution order of their base class's hooks, because the base hook is only called once.
This does not solve the case of having a make_model
hook in base class and an unrelated post_load
hook in a child class. Well, it does if you ensure to name the unrelated hook make_model
, but maybe you don't want to do that.
Also, priorities would allow marshmallow-based framework to use hook while application code could use hooks as well. Application would just have to watch the priority numbers in the framework. Hence the 10-step idea described above by @vgavro.
If most people are satisfied with current implementation (where base class needs to define hooks for the application and children classes override them), I won't struggle for this feature to be integrated.
Python has a builtin way to solve method resolution order in inherited classes. Naming the methods the same thing has the benefit of explicitly declaring the dependency relationship. This is much more explicit than a number. You can keep the special method name, just don't add the decorator to it and you can call it from inside the overridden hook.
class Special(Base):
name = fields.Str()
def fancy_name(self, data):
data['name'] = data['name'].title()
return data
@post_load
def make_model(self, data):
data = self.fancy_name(data)
return super().make_model(data)
Good point, @deckar01. Pushing this logic even further, the whole decoration mechanism could be removed, the feature being accomplished by hooks with fixed names in the base schema.
class BaseSchema(base.SchemaABC):
def pre_load(data):
"""Override this to add pre-load logic"""
return data
The invocation of the hooks would be greatly simplified.
This kind of hook is quite common, simpler than current decoration mechanism, and as you say, it relies on usual Python method resolution order.
Maybe it's not so obvious, due to the many
/ pass_many
combinations, I'd need to investigate a little more, but I like the idea.
It would be a little regression for people using multiple decorated methods with no care about the order (e.g. using two independent post_load
methods), as they'd have to merge their processors into a common method, but I think this is acceptable. At least we'd have predictable order.
I agree that using python MRO and decorators at same time is counterintuitive, because you should rely on parent custom method names in that case, which may looks good in one-parent-one-child example, but can be real pain if you're defining complex schemes with mixins.
But I like the whole idea to remove decorators and use MRO instead.
Problem with many/pass_many can be resolved with special methods name: post_load_many
, which will just apply self.post_load
for every item in collection by default.
And if we agree on this https://github.com/marshmallow-code/marshmallow/pull/1034#issuecomment-436556833 (always passing **kwargs with original, partial, etc. instead of requesting arguments in decorator) - we could drop current complex implementation and have clear priority with only disadvantage of changed api, which is expected in new major release anyway.
validates_schema
option skip_on_field_errors=False
can by achieved by explicitly processing ValidationError exception.
@validate(field_name)
decorator should be replaced by validate_field(name, value, field=field_obj)
then not to break consistency and have order control! (if name == 'my_field': do_something
)
So, this looks like a big change. As I can see it - or we're relying on MRO and drop decorators, or we're adding "priority" to keep current decorators API.
Argument that "you can define custom names and then use one post_load, and if post_load already defined in parents - you should override all of them to keep things under control" just don't looks good to me.
Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method. It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order. Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case.
This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support.
Is there still interest in this? Seems like the OP can be met using the methods above. Unless there are strong feelings about adding this API, I'll plan to close this in a few days.
I still think things would be easier if we had hooks instead of decorators. easier to maintain, easier to understand. Simple Python and inheritance. The code in this lib to support hooks would be very minimal.
In my app, I'm sometimes declaring post_load methods then overriding them in child classes and I'm reaching a point where it would be saner to define hooks in my base Schema using decorators and then don't use the decorators in rest of the code to avoid messing things up.
I don't have the time to draft an implementation right now.
This is response to @deckar01 comment, which happened to be last for a long time, and now I see that it looks like everyone is happy about it, which is not :-) Also, in last @lafrech comment I believe when he say "hooks" he means MRO, and in my and @deckar01 comment when I say "hooks" I mean current decorator implementation (just to be clear).
[In short - I'm completely support idea to remove decorators in favor of clean MRO, but if it's not to be done - at least add "order" to hooks, because for hooks that modify returned data and data which passes to next hooks - order is crucial]
-
"Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method." -- special method name is not the same as "just another method". Again, when you're overriding some parent method in python you don't mark it special way - and seems like python community ok with it.
-
"It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order." -- Well, it is, but when "hooks" can return some data, on which other hooks can depend - like
pre_load
,post_load
,pre_dump
,post_dump
- order is crucial, and this decorators are used mostly for that. So - hooks are cool. for validation which is not changing value. but for input/output changes - hooks are evil. -
"Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case." -- That's not my point. my point is - IF we use decorators (hooks) - we SHOULD allow them order, because it's normal practice and it's crucial even more, because other hooks depends on returned data. If we use MRO - then api must act like MRO, without messing things with MRO/hooks. "There should be one-- and preferably only one --obvious way to do it."
-
"This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support." -- Really, I'm also using this pattern not to mess things up..
class BaseModel:
@post_load()
def post_load(self, data, **kwargs):
return data
# same pre_load, post_load, pre_dump, post_dump
but if we're writing library which is not used in one small project, but have it's own ecosystem around it - i'm expecting one obvious way to interact with schemas in other libraries without pain.
Thanks @vgavro for clarifying what I meant.
FYI this is already possible by properly alphabetising processors, since resolve_hooks
is using dir
: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L166
Not sure if it's omitted in the documentation on purpose or initial implementation didn't intend it to work this way ;)
No it wasn't intended, but it's nice to know.
This dates back to when dicts were not ordered, I suppose dir wasn't either.
Ok. Though dir
has always returned list.
https://docs.python.org/2.6/library/functions.html#dir