marshmallow
marshmallow copied to clipboard
Serialize multiple attributes using Pluck
Can Pluck be used to deserialize multiple source attributes to a single field and then serialize back to multiple fields? I read in #1315 that the intent of Pluck is to go from flat -> nested -> flat, but I couldn't figure out how to get from the input:
{'baz': 'blue', 'qux': 'orange'}
by deserializing to
{'bar': {'baz': 'blue', 'qux': 'orange'}}
and then serializing back to the above.
I could get eitherbaz or qux but not both.
I attempted something like
class Bar(Schema):
baz = fields.String()
qux = fields.String()
class Foo(Schema):
bar_1 = fields.Pluck(Bar, 'baz', data_key='baz', attribute='bar')
bar_2 = fields.Pluck(Bar, 'qux', data_key='qux', attribute='bar')
But this throws the error:
ValueError: The attribute argument for one or more fields collides with another field's name or attribute argument. Check the following field names and attribute arguments: ['bar']
Is there something I'm missing, or is this use case not supported by Pluck?
Pluck is essentially a single, transplanted field from one schema to another; given Pluck(SomeSchema, 'fieldname') what happens is:
- on serialization, it uses
SomeSchema().serialize(value_to_serialize)['fieldname']. - on deserialisation, it uses `SomeSchema(partial=True).deserialise({'fieldname': value_to_deserialize}).
When many=True it wraps / unwraps to and from lists as necessary.
That all means that if you wanted to produce {'baz': 'blue', 'qux': 'orange'} as the output for the field, you need to pluck a field that outputs a dictionary, so perhaps fields.Dict() and a pre_dump can do what you wanted to achieve, or use a field.Nested() on Bar to produce that dictionary for you.
Thank you for the details. Leaving aside the deserialization case for a moment, it's still not clear to me how I could Pluck multiple fields; it seems like the above would output a dict as the value of the field, i.e. output something like:
{'bar': {'baz': 'blue', 'qux': 'orange'}}
What I'm really looking to do is more like, given e.g. a source object:
{'foo': 'green', 'bar': {'baz': 'blue', 'qux': 'orange'}}
I want to use a schema serialize this to:
{"foo": "green", "baz": "blue", "qux": "orange"}
Basically, my objects are nested, and the schema is flat - can I use Pluck to do this if there are multiple fields I want to pluck?
it seems like the above would output a dict as the value of the field, i.e. output something like:
{'bar': {'baz': 'blue', 'qux': 'orange'}}
You are completely right, and I should have thought this through. You can't use just Pluck for this then, you'd have to use additional features like @post_dump and @pre_load to move merge and extract that dictionary.
attribute='bar' ... attribute='bar'... fields collides
Is there something I'm missing, or is this use case not supported by Pluck?
It is not currently supported. Pluck takes a top level value, replaces it with a dictionary, and assigns the value to a new key in the dictionary. If we just skipped the collision check I suspect the current logic would naively overwrite first dict resulting in {'bar': {'qux': 'orange'}} in your example.
To support the syntax you tried we could skip the collision check for fields that can be merged (Nested, Pluck, Dict), probe for collisions when building the output, and update the shared dictionary. Having multiple fields in a schema that actually result in a single nested structure might not be the best API for this feature though. I think schemas are intended to maintain a 1:1 mapping of the deserialized data structure.
Maybe Pluck could be extended to support multiple keys from the nested schema? The current behavior implicitly maps the value at the field name / data_key to a single nested key, but it wouldn't be too much of a stretch to allow explicitly mapping the top level field names to the nested field names. Building on the example from the docs:
from marshmallow import Schema, fields
class ArtistSchema(Schema):
id = fields.Int()
name = fields.Str()
class AlbumSchema(Schema):
artist = fields.Pluck(ArtistSchema, id='artist_id', name='artist_name') # concept
in_data = {'artist_id': 42, 'artist_name': 'Douglas Adams'}
loaded = AlbumSchema().load(in_data) # => {'artist': {'id': 42, 'name': 'Douglas Adams'}}
dumped = AlbumSchema().dump(loaded) # => {'artist_id': 42, 'artist_name': 'Douglas Adams'}
Thanks for the analysis, very helpful. I was thinking that I could do the flattening and nesting in a post_dump and pre_load, respectively, but was getting a little tripped up in whether the philosophy is to have the schema match the input/serialized data or the internalized representation (and not sure it's explicit in the docs).
If the latter is the case, the above approach seems like good syntax for what is probably a common use case.
I also have nested schemas (i.e. two or more levels down) that I'd like to be able to flatten on serialization; fields.Nested supports the dot notation as an argument to only but it doesn't flatten, it only excludes. I wonder if the above syntax could support dotted attributes as well.
@plondino to solve your issue, you could do this:
class Foo(Schema):
bar_1 = fields.String(attribute='bar.baz', data_key='baz')
bar_2 = fields.String(attribute='bar.qux', data_key='qux')
Foo().load({'baz': 'blue', 'qux': 'orange'}) # => {'bar': {'baz': 'blue', 'qux': 'orange'}}
Foo().dumps({'bar': {'baz': 'blue', 'qux': 'orange'}}) # => {"baz": "blue", "qux": "orange"}
It makes me a bit wonder why the Pluck field type exists in the first place, but it just works :shrug:
@plondino to solve your issue, you could do this:
class Foo(Schema): bar_1 = fields.String(attribute='bar.baz', data_key='baz') bar_2 = fields.String(attribute='bar.qux', data_key='qux') Foo().load({'baz': 'blue', 'qux': 'orange'}) # => {'bar': {'baz': 'blue', 'qux': 'orange'}} Foo().dumps({'bar': {'baz': 'blue', 'qux': 'orange'}}) # => {"baz": "blue", "qux": "orange"}It makes me a bit wonder why the
Pluckfield type exists in the first place, but it just works 🤷
This x100! Not documented and don't understand how it works exactly, but does exactly what I've been trying to do. Thanks!