marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Consider moving `marshmallow-oneofschema` into the core library?

Open sirosen opened this issue 3 years ago • 7 comments

OneOfSchema is the go-to answer for questions about polymorphism.

I would like to see it on-course to have the following two things happen:

  1. Avoid overriding load and dump, so that hooks work

As I have proposed elsewhere, I think it should hook in at _deserialize and _serialize. It's hard to make the case for extending marshmallow to support this use-case right now.

  1. Built-in support in apispec

This is possible without bringing it into marshmallow, but harder to justify.

The current marshmallow-oneofschema code is quite small in size, only a little over 100 LLOC. Plus, I think we can make it even smaller and simpler.

We could boil the whole thing down to something like this:

class OneOfSchema(Schema):
    oneof: Dict[str, Type[Schema]] = {}
    discriminator: Optional[str] = None
    def discriminate_on_load(self, data):
        return self.oneof[self.discriminator]
    def discriminate_on_dump(self, data):
        return self.oneof[data["discriminator"]]
    def _serialize(...): ...
    def _deserialize(...): ...

I also think moving it into marshmallow would result in more active maintenance.

@sloria, @lafrech, @deckar01, has this been discussed in the past? I wasn't able to find any useful clues other than some comments about not wanting to do this in the past. I understand not wanting to grow the public surface of a package and take on the related doc, bugfix, Q&A, etc workload, but in this case I think it's counterproductive.

I'd be happy to work on this if it stood a chance at being accepted.

sirosen avatar Feb 18 '21 21:02 sirosen

I like the idea. I always thought marshmallow lacked this feature. The recurring questions about polymorphism show there is a need. Although this does not mean oneofschema would answer the need on each of these questions.

There is another lib doing this: https://github.com/Bachmann1234/marshmallow-polyfield. Perhaps we should pay it a look to see if it has things oneofschema doesn't.

I agree it makes it easier to support in third-party such as apispec (even if practically, I don't have any other example than apispec and it can be supported conditionaly in apispec so not a blocker).

Perhaps time did its job and it is not more stable and easier to support now, so if the maintenance work is minimal, not worse than maintaining it separately, then why not?

I don't have that need anymore because I removed polymorphism in the app I was using it... because it was too complicated to achieve for the benefit. I won't be the one working on it. But if you come up with something minimal (in code complexity) and generic (not too opinionated) enough, then it could be a nice addition.

lafrech avatar Feb 19 '21 00:02 lafrech

I think polyfield has better a better API in terms of hooks for letting you define the dispatch from input data to a schema to use for loading or dumping. I'd like to incorporate that into any rewrite: defining the "discriminate" methods to return a Schema rather than a string which is used to index into a dict.

IMO oneofschema does a better job of defining a reasonable default mode of use: a "type" field in your data names the schema to use in a dict, stored on the oneofschema class. That also makes it pretty easy to hook into apispec in the normal case.

polyfield also requires that the data be wrapped in a field, whereas oneofschema can be used at the "top level" of an object structure. I much prefer to not force the expression of oneOf to go in a field class, since it inevitably asks users to wrap their input data with pre_load and post_dump hooks.

There's also a lib out there -- I forget which one -- which defines a field that combines multiple fields. That suffers from issues where, for example, an Int | String field can behave unexpectedly on "1". And doing try...except chaining to try fields in series can behave strangely when custom fields or nested schemas have side-effects on load.

You can sometimes simulate field-level polymorphism with oneofschema, but I don't think there's a single solution which covers all use-cases perfectly.


apispec is the only library which I think benefits from this directly. It could start to support marshmallow.OneOfSchema without having to play favorites among the various non-marshmallow libraries. That also has a positive downstream impact on flask-smorest; maybe other projects too.

Of course, end-users benefit from this too. marshmallow.OneOfSchema would become the de-facto standard solution, with the existing suite of 3rd party solutions still available for those whose needs aren't covered by it.

sirosen avatar Feb 19 '21 21:02 sirosen

There doesn't seem to be a good standard to lean on for type metadata upstream (json/python). Building on top of a downstream standard (apispec) indicates to me that this feature (oneofschema) should be a separate package and possibly bundled downstream.

If you have control over your data structure, I would recommend avoiding polymorphic schemas. I generally advocate against promoting anti-patterns with core features.

If you don't have control over the data, it is simple to make custom fields for each abstract class. It starts to get sticky when you need to make a custom schema to handle polymorphism at the top level. I think it is reasonable to require an extra package to sweeten the syntax of polymorphism for this use case. I think improving the schema interface to make building custom schemas easier would provide more of a benefit to a wider array of use cases.

deckar01 avatar Feb 20 '21 00:02 deckar01

There doesn't seem to be a good standard to lean on for type metadata upstream (json/python).

I think the closest thing you'll find is jsonschema's oneOf. OpenAPI extends jsonschema oneOf with the notion of a "discriminator".

Building on top of a downstream standard (apispec) ...

I cite apispec because it demonstrates the possible positive downstream impact of including oneofschema in marshmallow.

The idea of polymorphic schemas exists independently from any marshmallow implementation, apispec, jsonschema, or OpenAPI.

If you have control over your data structure, I would recommend avoiding polymorphic schemas. I generally advocate against promoting anti-patterns with core features.

This is a very strong statement, and one with which I disagree. I don't think use of oneOf should be discouraged.

I don't know if this is the time and place to try to make a convincing argument, so I'm just noting the point of disagreement for now. I'm happy to share thoughts on this if that's wanted.

If you don't have control over the data, it is simple to make custom fields for each abstract class. It starts to get sticky when you need to make a custom schema to handle polymorphism at the top level.

I'm not interested in field-level polymorphism. If you want a union field, write a custom field class. That works and covers the need.

Similarly, jsonschema/openapi oneOf can be expressed in marshmallow today. For example:

class DispatchingSchema(Schema):
    v1doc = fields.Nested(V1Schema)
    v2doc = fields.Nested(V2Schema)

    @pre_load
    def dispatch(self, data, **kwargs):
        version = data.pop("version", 1)
        return {f"v{version}doc": data}

    @post_load
    def unpack(self, data, **kwargs):
        return next(iter(data.values()))

However, this approach is full of holes (what if the input data is an immutable mapping, etc, etc). I wouldn't tell someone to "just go do that" as readily as I would for custom fields.

sirosen avatar Feb 20 '21 09:02 sirosen

I don't think use of oneOf should be discouraged.

That is fair. I was mainly thinking about SQL schemas, but I can imagine other use cases where polymorphism would be desirable.

make custom fields for each abstract class.

I'm not interested in field-level polymorphism.

Not field-level, a custom field to wrap the abstract class's schema and encapsulate the type checking logic.

class EventField(fields.Field):
    alert_schema = AlertSchema()
    meeting_schema = MeetingSchema()

    def _serialize(self, value, attr, obj, **kwargs):
        if isinstance(value, Alert):
            return self.alert_schema.dump(value)
        if isinstance(value, Meeting):
            return self.meeting_schema.dump(value)

    def _deserialize(self, value, attr, data, **kwargs):
        if value['type'] == 'alert':
            return self.alert_schema.load(value)
        if value['type'] == 'meeting':
            return self.meeting_schema.load(value)
        raise ValidationError("Invalid event")

It would be nice if making a custom schema was this easy, which brings me back to the original proposal:

Avoid overriding load and dump, so that hooks work

I think this is an indication that Schema needs to be easier to customize, not that we need to bake more custom functionality into the base schema.

Built-in support in apispec This is possible without bringing it into marshmallow, but harder to justify. We could boil the whole thing down to something like this:

I think the core could support this use case better, but I don't think it should implement it.

deckar01 avatar Feb 20 '21 17:02 deckar01

Not field-level, a custom field to wrap the abstract class's schema and encapsulate the type checking logic.

Ah, sorry for misunderstanding.

Your example is a good one; it's what I wish marshmallow-oneofschema looked like at its core.

If marshmallow isn't going to have support for this, and the reasoning is that we want writing a custom Schema subclass for the job very easy, then we should aspire to similar simplicity for such a use-case.

Ideally, I could just write

class OneOfSchema(Schema):
    type_map = {}
    discriminator = "type"

    @concrete_load  # <--- imaginary new hook
    def do_load(self, data, many, **kwargs):
        if self.discriminator not in data: ...  # ValidationError
        type_val = self.type_map.get(data[self.discriminator])
        if not type_val: ...  # ValidationError
        return self.type_map[type_val].load(data, **kwargs)

class FancySchema(OneOfSchema):
    type_map = {"foo": FooSchema(), "bar": BarSchema()}

If I could do that, maybe decorate it as

# note: apispec support for me doesn't really hinge on `OneOfSchema` being in the core
# good core support would let me stop using the `marshmallow-oneofschema` library and just roll my own
@apispec.oneof(discriminator="type", schemas=(FooSchema, BarSchema))
class FancySchema(OneOfSchema):
    type_map = {"foo": FooSchema(), "bar": BarSchema()}

I'd be thrilled.

Obviously the interface doesn't need to look like that exactly. But somehow it's necessary for marshmallow to take care of pre- and post-load hooks, the validate hook, and handling of many. Like existing methods, pass_many or other options might be needed for some use-cases.

I'd suggest putting the final, itty-bitty example into the docs as "this is how you could handle oneOf".


I'm not sure what to do with this issue.

On the one hand, I'm ready to be convinced that marshmallow.OneOfSchema isn't needed if there's going to be some approach that makes doing it yourself super-easy.

On the other hand, without any kind of plan for what that should look like, I'm reluctant to call this settled.

Should we use this space to discuss what an improved interface for extending Schema would look like?

sirosen avatar Feb 20 '21 19:02 sirosen

I think it would be a nice idea to handle polymorphism in marshmallow core.

However, I don't really like the above FancySchema approach that requires a schema to be derived from a special class to be able to handle polymorphism in one field. I think this should be handled on the field level.

Moreover, I think an envelope should be considered in addition to a discriminator field. Whenever possible, an envelope should be preferred, as this approach is used widely in YAML configuration files, because it is more intuitive for the human reader.

I.e. the following would be nice:

input:
  input_a:
    <input_a configuration>
  # alternatively:
  input_b:
    <input_b configuration>
  # but not both, of cause
class ConfigSchema(Schema):
  input = OneOf({"input_a": InputASchema, "input_b": InputBSchema})

ConfigSchema.load({"input": {"input_a": {...}}}) == {"input": InputA(...)}

moi90 avatar May 06 '21 20:05 moi90