marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Dumping objects with fields unknown to schema in versions >=3.0.0

Open arjunKay opened this issue 4 years ago • 8 comments

I am trying to serialize an object with some fields additional to that defined in the schema. While dumping, I am also specifying fields to be included using only. I was able to specify fields unknown to the schema in the only parameter when I was using version 2.19.5, but it's breaking when I upgraded to 3.5.1. For clarity, consider the following code:

class MySchema(Schema):
    a = fields.Str()
    b = fields.Str()
    c = fields.Str()

class MyClass:
    def __init__(self, a, b, c, d, e):
        self.a = a
        self.b = b
        self.c = c
        self.d = d
        self.e = e


obj = MyClass("one", "two", "three", "four", "five")

data_out = MySchema(only=('a', 'b', 'c', 'd')).dump(obj)

print(data_out.data)

# Output with 2.19.5 (expected behavior) => {'a': 'one', 'c': 'three', 'b': 'two', 'd': 'four'}
# With 3.5.1 => ValueError: Invalid fields for <MySchema(many=False)>: {'d'}

One solution is to add those unknown fields to the Schema, but I was wondering whether there is a proper workaround for this issue when I am using the newer versions (specifically >=3.0.0).

I feel this is similar to #1198 but I was not able to find a solution for this in the thread.

arjunKay avatar Mar 20 '20 12:03 arjunKay

That error occurs during schema construction, not the dump operation. The move to be more strict about field names came about to prevent typos from running without warning and silently losing data.

https://marshmallow.readthedocs.io/en/stable/upgrading.html#schemas-raise-validationerror-when-deserializing-data-with-unknown-keys

Marshmallow provides implicit field creation to ease the burden of declaring fields. This may work for your use case depending on the number of additional fields you need.

from marshmallow import Schema, fields


class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b

class MySchema(Schema):
    a = fields.Str()

class MySchemaB(MySchema):
    class Meta:
        additional = ['b']

obj = MyClass("one", "two")
MySchemaD().dump(obj)
# {'a': 'one', 'b': 'two'}

https://marshmallow.readthedocs.io/en/stable/quickstart.html#implicit-field-creation

Currently the only way to ingest truly unknown data is during load. If you are planning on dumping that data, it has to be declared explicitly, because there has never been a mechanism for dumping unknown data AFAIK.

deckar01 avatar Mar 20 '20 13:03 deckar01

Dumping unknown data would be complicated because objects may have attributes you don't want to dump. It only makes sense for dicts or objects where the content to be dumped is clearly identified.

lafrech avatar Mar 20 '20 13:03 lafrech

You can also generate schemas dynamically at runtime using Schema.from_dict.

from dataclasses import dataclass

from marshmallow import Schema, fields


@dataclass
class MyClass:
    a: str
    b: str


class MySchema(Schema):
    a = fields.Str()


obj = MyClass(a="one", b="two")
MySchema2 = MySchema.from_dict({"b": fields.Str()})
print(MySchema2().dump(obj))
# {'a': 'one', 'b': 'two'}

sloria avatar Mar 20 '20 14:03 sloria

Thanks for the help @sloria @lafrech and @deckar01.

I was able to replicate the expected behavior by creating a subclass from the original schema and specifying the additional fields in Meta as @deckar01 suggested.

arjunKay avatar Mar 24 '20 06:03 arjunKay

Another workaround I tried was using unknown=INCLUDE with only. Referring back to my code, if I do the following

data_out  = MySchema(unknown=INCLUDE, only={'a', 'b', 'c', 'e'}).dump(obj)

it will throw a value error. Since we are including unknown fields, doesn't it make sense to permit unknown fields in only parameter?

arjunKay avatar Mar 24 '20 06:03 arjunKay

I'm having the same issue. Would be great if it could allow unknown fields.

The best approach I found was the one below, but is not want we are looking for.

class MySchema(Schema):
    class Meta:
        unknown=INCLUDE # for load but does not work for dump
        additional=('new_field',) # for load and dump but you have to implicit declare each variable.

eddpascoal avatar Aug 01 '21 21:08 eddpascoal

I feel it needs to support dumping unknown fields, I want to use this to validate and populate fields before inserting into a DB but I will have unknown fields. While the load includes unknowns, I can't insert objects into the DB. There already exists implicit loading, why not do the same for dumping? If an object cannot be inferred, throw an error.

For now I've worked around it with a post dump:

    @post_dump(pass_original=True)
    def keep_unknowns(self, output, orig, **kwargs):
        for key in orig:
            if key not in output:
                output[key] = orig[key]
        return output

greggmi avatar Oct 20 '21 01:10 greggmi

I feel it needs to support dumping unknown fields, I want to use this to validate and populate fields before inserting into a DB but I will have unknown fields. While the load includes unknowns, I can't insert objects into the DB. There already exists implicit loading, why not do the same for dumping? If an object cannot be inferred, throw an error.

For now I've worked around it with a post dump:

    @post_dump(pass_original=True)
    def keep_unknowns(self, output, orig, **kwargs):
        for key in orig:
            if key not in output:
                output[key] = orig[key]
        return output

you're a lifesaver. I just spent so many hours trying to include an arbitrary key on my API response and could not for the life of me figure out how to include unknowns like load does

martinmckenna avatar Feb 26 '22 10:02 martinmckenna