marshmallow
marshmallow copied to clipboard
Consider moving `marshmallow-oneofschema` into the core library?
OneOfSchema
is the go-to answer for questions about polymorphism.
I would like to see it on-course to have the following two things happen:
- Avoid overriding
load
anddump
, so that hooks work
As I have proposed elsewhere, I think it should hook in at _deserialize
and _serialize
. It's hard to make the case for extending marshmallow
to support this use-case right now.
- Built-in support in apispec
This is possible without bringing it into marshmallow
, but harder to justify.
The current marshmallow-oneofschema code is quite small in size, only a little over 100 LLOC. Plus, I think we can make it even smaller and simpler.
We could boil the whole thing down to something like this:
class OneOfSchema(Schema):
oneof: Dict[str, Type[Schema]] = {}
discriminator: Optional[str] = None
def discriminate_on_load(self, data):
return self.oneof[self.discriminator]
def discriminate_on_dump(self, data):
return self.oneof[data["discriminator"]]
def _serialize(...): ...
def _deserialize(...): ...
I also think moving it into marshmallow
would result in more active maintenance.
@sloria, @lafrech, @deckar01, has this been discussed in the past? I wasn't able to find any useful clues other than some comments about not wanting to do this in the past. I understand not wanting to grow the public surface of a package and take on the related doc, bugfix, Q&A, etc workload, but in this case I think it's counterproductive.
I'd be happy to work on this if it stood a chance at being accepted.
I like the idea. I always thought marshmallow lacked this feature. The recurring questions about polymorphism show there is a need. Although this does not mean oneofschema would answer the need on each of these questions.
There is another lib doing this: https://github.com/Bachmann1234/marshmallow-polyfield. Perhaps we should pay it a look to see if it has things oneofschema doesn't.
I agree it makes it easier to support in third-party such as apispec (even if practically, I don't have any other example than apispec and it can be supported conditionaly in apispec so not a blocker).
Perhaps time did its job and it is not more stable and easier to support now, so if the maintenance work is minimal, not worse than maintaining it separately, then why not?
I don't have that need anymore because I removed polymorphism in the app I was using it... because it was too complicated to achieve for the benefit. I won't be the one working on it. But if you come up with something minimal (in code complexity) and generic (not too opinionated) enough, then it could be a nice addition.
I think polyfield has better a better API in terms of hooks for letting you define the dispatch from input data to a schema to use for loading or dumping. I'd like to incorporate that into any rewrite: defining the "discriminate" methods to return a Schema
rather than a string which is used to index into a dict.
IMO oneofschema does a better job of defining a reasonable default mode of use: a "type"
field in your data names the schema to use in a dict, stored on the oneofschema class. That also makes it pretty easy to hook into apispec in the normal case.
polyfield also requires that the data be wrapped in a field, whereas oneofschema can be used at the "top level" of an object structure. I much prefer to not force the expression of oneOf
to go in a field class, since it inevitably asks users to wrap their input data with pre_load
and post_dump
hooks.
There's also a lib out there -- I forget which one -- which defines a field that combines multiple fields. That suffers from issues where, for example, an Int | String
field can behave unexpectedly on "1"
. And doing try...except chaining to try fields in series can behave strangely when custom fields or nested schemas have side-effects on load.
You can sometimes simulate field-level polymorphism with oneofschema, but I don't think there's a single solution which covers all use-cases perfectly.
apispec
is the only library which I think benefits from this directly. It could start to support marshmallow.OneOfSchema
without having to play favorites among the various non-marshmallow libraries.
That also has a positive downstream impact on flask-smorest
; maybe other projects too.
Of course, end-users benefit from this too. marshmallow.OneOfSchema
would become the de-facto standard solution, with the existing suite of 3rd party solutions still available for those whose needs aren't covered by it.
There doesn't seem to be a good standard to lean on for type metadata upstream (json/python). Building on top of a downstream standard (apispec) indicates to me that this feature (oneofschema) should be a separate package and possibly bundled downstream.
If you have control over your data structure, I would recommend avoiding polymorphic schemas. I generally advocate against promoting anti-patterns with core features.
If you don't have control over the data, it is simple to make custom fields for each abstract class. It starts to get sticky when you need to make a custom schema to handle polymorphism at the top level. I think it is reasonable to require an extra package to sweeten the syntax of polymorphism for this use case. I think improving the schema interface to make building custom schemas easier would provide more of a benefit to a wider array of use cases.
There doesn't seem to be a good standard to lean on for type metadata upstream (json/python).
I think the closest thing you'll find is jsonschema's oneOf
. OpenAPI extends jsonschema oneOf with the notion of a "discriminator".
Building on top of a downstream standard (apispec) ...
I cite apispec because it demonstrates the possible positive downstream impact of including oneofschema
in marshmallow
.
The idea of polymorphic schemas exists independently from any marshmallow implementation, apispec, jsonschema, or OpenAPI.
If you have control over your data structure, I would recommend avoiding polymorphic schemas. I generally advocate against promoting anti-patterns with core features.
This is a very strong statement, and one with which I disagree. I don't think use of oneOf
should be discouraged.
I don't know if this is the time and place to try to make a convincing argument, so I'm just noting the point of disagreement for now. I'm happy to share thoughts on this if that's wanted.
If you don't have control over the data, it is simple to make custom fields for each abstract class. It starts to get sticky when you need to make a custom schema to handle polymorphism at the top level.
I'm not interested in field-level polymorphism. If you want a union field, write a custom field class. That works and covers the need.
Similarly, jsonschema/openapi oneOf
can be expressed in marshmallow today. For example:
class DispatchingSchema(Schema):
v1doc = fields.Nested(V1Schema)
v2doc = fields.Nested(V2Schema)
@pre_load
def dispatch(self, data, **kwargs):
version = data.pop("version", 1)
return {f"v{version}doc": data}
@post_load
def unpack(self, data, **kwargs):
return next(iter(data.values()))
However, this approach is full of holes (what if the input data is an immutable mapping, etc, etc). I wouldn't tell someone to "just go do that" as readily as I would for custom fields.
I don't think use of oneOf should be discouraged.
That is fair. I was mainly thinking about SQL schemas, but I can imagine other use cases where polymorphism would be desirable.
make custom fields for each abstract class.
I'm not interested in field-level polymorphism.
Not field-level, a custom field to wrap the abstract class's schema and encapsulate the type checking logic.
class EventField(fields.Field):
alert_schema = AlertSchema()
meeting_schema = MeetingSchema()
def _serialize(self, value, attr, obj, **kwargs):
if isinstance(value, Alert):
return self.alert_schema.dump(value)
if isinstance(value, Meeting):
return self.meeting_schema.dump(value)
def _deserialize(self, value, attr, data, **kwargs):
if value['type'] == 'alert':
return self.alert_schema.load(value)
if value['type'] == 'meeting':
return self.meeting_schema.load(value)
raise ValidationError("Invalid event")
It would be nice if making a custom schema was this easy, which brings me back to the original proposal:
Avoid overriding load and dump, so that hooks work
I think this is an indication that Schema
needs to be easier to customize, not that we need to bake more custom functionality into the base schema.
Built-in support in apispec This is possible without bringing it into marshmallow, but harder to justify. We could boil the whole thing down to something like this:
I think the core could support this use case better, but I don't think it should implement it.
Not field-level, a custom field to wrap the abstract class's schema and encapsulate the type checking logic.
Ah, sorry for misunderstanding.
Your example is a good one; it's what I wish marshmallow-oneofschema
looked like at its core.
If marshmallow isn't going to have support for this, and the reasoning is that we want writing a custom Schema subclass for the job very easy, then we should aspire to similar simplicity for such a use-case.
Ideally, I could just write
class OneOfSchema(Schema):
type_map = {}
discriminator = "type"
@concrete_load # <--- imaginary new hook
def do_load(self, data, many, **kwargs):
if self.discriminator not in data: ... # ValidationError
type_val = self.type_map.get(data[self.discriminator])
if not type_val: ... # ValidationError
return self.type_map[type_val].load(data, **kwargs)
class FancySchema(OneOfSchema):
type_map = {"foo": FooSchema(), "bar": BarSchema()}
If I could do that, maybe decorate it as
# note: apispec support for me doesn't really hinge on `OneOfSchema` being in the core
# good core support would let me stop using the `marshmallow-oneofschema` library and just roll my own
@apispec.oneof(discriminator="type", schemas=(FooSchema, BarSchema))
class FancySchema(OneOfSchema):
type_map = {"foo": FooSchema(), "bar": BarSchema()}
I'd be thrilled.
Obviously the interface doesn't need to look like that exactly. But somehow it's necessary for marshmallow to take care of pre- and post-load hooks, the validate hook, and handling of many
. Like existing methods, pass_many
or other options might be needed for some use-cases.
I'd suggest putting the final, itty-bitty example into the docs as "this is how you could handle oneOf
".
I'm not sure what to do with this issue.
On the one hand, I'm ready to be convinced that marshmallow.OneOfSchema
isn't needed if there's going to be some approach that makes doing it yourself super-easy.
On the other hand, without any kind of plan for what that should look like, I'm reluctant to call this settled.
Should we use this space to discuss what an improved interface for extending Schema
would look like?
I think it would be a nice idea to handle polymorphism in marshmallow core.
However, I don't really like the above FancySchema
approach that requires a schema to be derived from a special class to be able to handle polymorphism in one field. I think this should be handled on the field level.
Moreover, I think an envelope should be considered in addition to a discriminator field. Whenever possible, an envelope should be preferred, as this approach is used widely in YAML configuration files, because it is more intuitive for the human reader.
I.e. the following would be nice:
input:
input_a:
<input_a configuration>
# alternatively:
input_b:
<input_b configuration>
# but not both, of cause
class ConfigSchema(Schema):
input = OneOf({"input_a": InputASchema, "input_b": InputBSchema})
ConfigSchema.load({"input": {"input_a": {...}}}) == {"input": InputA(...)}