marshmallow
marshmallow copied to clipboard
Dumping objects with fields unknown to schema in versions >=3.0.0
I am trying to serialize an object with some fields additional to that defined in the schema. While dumping, I am also specifying fields to be included using only
. I was able to specify fields unknown to the schema in the only
parameter when I was using version 2.19.5
, but it's breaking when I upgraded to 3.5.1
. For clarity, consider the following code:
class MySchema(Schema):
a = fields.Str()
b = fields.Str()
c = fields.Str()
class MyClass:
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
obj = MyClass("one", "two", "three", "four", "five")
data_out = MySchema(only=('a', 'b', 'c', 'd')).dump(obj)
print(data_out.data)
# Output with 2.19.5 (expected behavior) => {'a': 'one', 'c': 'three', 'b': 'two', 'd': 'four'}
# With 3.5.1 => ValueError: Invalid fields for <MySchema(many=False)>: {'d'}
One solution is to add those unknown fields to the Schema, but I was wondering whether there is a proper workaround for this issue when I am using the newer versions (specifically >=3.0.0).
I feel this is similar to #1198 but I was not able to find a solution for this in the thread.
That error occurs during schema construction, not the dump operation. The move to be more strict about field names came about to prevent typos from running without warning and silently losing data.
https://marshmallow.readthedocs.io/en/stable/upgrading.html#schemas-raise-validationerror-when-deserializing-data-with-unknown-keys
Marshmallow provides implicit field creation to ease the burden of declaring fields. This may work for your use case depending on the number of additional fields you need.
from marshmallow import Schema, fields
class MyClass:
def __init__(self, a, b):
self.a = a
self.b = b
class MySchema(Schema):
a = fields.Str()
class MySchemaB(MySchema):
class Meta:
additional = ['b']
obj = MyClass("one", "two")
MySchemaD().dump(obj)
# {'a': 'one', 'b': 'two'}
https://marshmallow.readthedocs.io/en/stable/quickstart.html#implicit-field-creation
Currently the only way to ingest truly unknown data is during load. If you are planning on dumping that data, it has to be declared explicitly, because there has never been a mechanism for dumping unknown data AFAIK.
Dumping unknown data would be complicated because objects may have attributes you don't want to dump. It only makes sense for dicts or objects where the content to be dumped is clearly identified.
You can also generate schemas dynamically at runtime using Schema.from_dict
.
from dataclasses import dataclass
from marshmallow import Schema, fields
@dataclass
class MyClass:
a: str
b: str
class MySchema(Schema):
a = fields.Str()
obj = MyClass(a="one", b="two")
MySchema2 = MySchema.from_dict({"b": fields.Str()})
print(MySchema2().dump(obj))
# {'a': 'one', 'b': 'two'}
Thanks for the help @sloria @lafrech and @deckar01.
I was able to replicate the expected behavior by creating a subclass from the original schema and specifying the additional fields in Meta
as @deckar01 suggested.
Another workaround I tried was using unknown=INCLUDE
with only
. Referring back to my code, if I do the following
data_out = MySchema(unknown=INCLUDE, only={'a', 'b', 'c', 'e'}).dump(obj)
it will throw a value error. Since we are including unknown fields, doesn't it make sense to permit unknown fields in only
parameter?
I'm having the same issue. Would be great if it could allow unknown fields.
The best approach I found was the one below, but is not want we are looking for.
class MySchema(Schema):
class Meta:
unknown=INCLUDE # for load but does not work for dump
additional=('new_field',) # for load and dump but you have to implicit declare each variable.
I feel it needs to support dumping unknown fields, I want to use this to validate and populate fields before inserting into a DB but I will have unknown fields. While the load includes unknowns, I can't insert objects into the DB. There already exists implicit loading, why not do the same for dumping? If an object cannot be inferred, throw an error.
For now I've worked around it with a post dump:
@post_dump(pass_original=True)
def keep_unknowns(self, output, orig, **kwargs):
for key in orig:
if key not in output:
output[key] = orig[key]
return output
I feel it needs to support dumping unknown fields, I want to use this to validate and populate fields before inserting into a DB but I will have unknown fields. While the load includes unknowns, I can't insert objects into the DB. There already exists implicit loading, why not do the same for dumping? If an object cannot be inferred, throw an error.
For now I've worked around it with a post dump:
@post_dump(pass_original=True) def keep_unknowns(self, output, orig, **kwargs): for key in orig: if key not in output: output[key] = orig[key] return output
you're a lifesaver. I just spent so many hours trying to include an arbitrary key on my API response and could not for the life of me figure out how to include unknowns like load
does