cattrs icon indicating copy to clipboard operation
cattrs copied to clipboard

Converter ignores fields from child class when unstructuring (inheritance)

Open phiresky opened this issue 2 years ago • 12 comments

  • cattrs version: current master (a10ad1c638dd16308c9aa73c649fa92282503c00)
  • Python version: 3.10
  • Operating System: Linux

Description

I'm serializing a dataclass with a field. If the field has a more generic type than the actual instance, all the information from the child class is lost when unstructuring.

What I Did

from dataclasses import dataclass

import cattr


@dataclass(kw_only=True, frozen=True)
class Bar:
    x: int


@dataclass(kw_only=True, frozen=True)
class Foo(Bar):
    y: int


@dataclass(kw_only=True, frozen=True)
class Baz:
    foo: Bar

converter = cattr.Converter()
foo = Baz(foo=Foo(x=1, y=2))
print("unstructured", converter.unstructure(foo))

Expectation vs Reality

I'd expect it to output unstructured {'foo': {'x': 1, 'y': 2}} but it outputs unstructured {'foo': {'x': 1}}

Is this expected behaviour due to the way Converter generates unstructuring functions? If so, how can you change that behaviour?

phiresky avatar Jun 13 '22 11:06 phiresky

Interesting, what happens if you just unstructure an instance of Foo?

Tinche avatar Jun 13 '22 11:06 Tinche

That works fine:

converter.unstructure(baz)={'foo': {'x': 1}}
converter.unstructure(baz.foo)={'x': 1, 'y': 2}

phiresky avatar Jun 13 '22 11:06 phiresky

Something's definitely fishy here. Will look into it.

Tinche avatar Jun 13 '22 11:06 Tinche

I am seeing the same behavior with my subclass that has extra fields. They are lost in the unstructuring.

matmel avatar Aug 03 '22 19:08 matmel

Actually there is a workaround.

Using a similar example as the original one with attrs instead of dataclass (I did not test with the later), with latest commit in main branch:

The subclass problem with structuring and unstructuring

BaseConverter doesn't fail for the unstructuring, but fails for the structuring. Meanwhile, Converter fails at both.

from typing import Union
from attrs import define
from cattrs import Converter, BaseConverter

@define
class Parent:
    i: int

@define
class Child(Parent):
    j: int

@define
class A:
    a: Parent

c = Converter()
bc = BaseConverter()

# Trying unstructuring:
# ---------------------
print(c.unstructure(A(Child(1, 2))))  # {"a": {"i": 1}}, here Converter is wrong, {"a": {"i": 1, "j": 2}} is expected.
print(bc.unstructure(A(Child(1, 2))))  # {"a": {"i": 1, "j": 2}}, here BaseConverter is right!

# Trying structuring:
# -------------------
print(c.structure({"a": {"i": 1, "j": 2}}, C))  # A(a=Parent(i=1)), here Converter is wrong, A(a=Child(i=1, j=2)) is expected.
print(bc.unstructure({"a": {"i": 1, "j": 2}}, C))  # A(a=Parent(i=1)), here BaseConverter is wrong too!

Workaround

In the above example, if you replace the type definition of attribute a in class A by explicitly listing all subclasses,

@define
class A:
    a: Union[Parent, Child]

Then all structuring and unstructuring tests with both converters above produce the expected results. I guess this workaround is only valid for python>=3.7 as according to the typing.Union documentation explicit subclasses in Union are kept since that version. My tests were made with python 3.10.

Path forward

I tried to change the cattrs code before realizing that workaround and the documentation on typing.Union. It makes me think that the situation should be handled with the creation of automatic disambiguation functions where attrs defined types that have subclasses are automatically replaced with a Union type.

matmel avatar Aug 05 '22 15:08 matmel

@Tinche I am seeing that you plan to cut a new release soon. I would like to have this issue fixed in this new release. Could you provide some guidance on how to tackle the problem. Is my idea of an automatic disambiguation of a Union type containing all subclass types is valid? I may be able to create a PR.

matmel avatar Sep 29 '22 20:09 matmel

@matmel I can take a look at this before the next release. But in any case, cattrs is designed so you can come up with a strategy without me having to include it in a release ;)

Tinche avatar Sep 29 '22 21:09 Tinche

Alright, I've reviewed the issue in depth. It doesn't matter whether attrs classes or dataclasses are used, or whether the classes are kwonly or frozen.

Referencing the issue in the OP: on the surface, the issue is when cattrs encounters the Baz class for the first time, it'll generate an unstructuring function for it. This involved generating an unstructuring function for each of its fields. The foo field is typed as Bar, so the generated function will only produce the x field.

On the face of it, you can solve the issue (with a small efficiency loss) by inserting the line

converter.register_unstructure_hook(Bar, converter.unstructure_attrs_asdict)

This will, in effect, use the runtime value of fields typed as Bar when unstructuring them. The BaseConverter works here because it uses the runtime value always.

I would advise against this, however. From a data modelling perspective this approach is not sound.

If you have an instance of Baz, it's not safe to do baz.foo.y - any type checker will yell at you (and an IDE will not autocomplete it), and for good reason. And that reason is - you can't safely use subclasses in this way to add fields. The reasoning is exactly the same as for why you can't subclass enum.Enums and add members (see https://docs.python.org/3/library/enum.html#restricted-enum-subclassing).

Another way of looking at it: if you continued with this approach, a subclass in the future can add any field what-so-ever, thus defeating the purpose of structured data. At that point you might as well go with a dictionary. The point of structured data is to know in advance the shape of your data.

@matmel 's solution with a union is much better. (We haven't supported 3.6 for ~2 years now, so that's ok.) Using a union makes it very explicit about what the possible shapes of the data are, and in these cases it works out of the box.

On a theoretical level it's a little funny to me to see a Union[Parent, Child], since - logically speaking - it should be equivalent to just Parent, since every Child is a Parent. But it works, and I think it's the best we can do in Python at this time.

As for @matmel 's proposal for detecting subclasses, it's not really workable. Any class can gain subclasses after the unstructuring function is generated, __subclassess__ is global state anyway (and global state is dangerous), and the approach is not really sound from a data modelling perspective in the first place. So I don't see any way forward for cattrs here, except for some documentation which I'd gladly merge.

TLDR; use a union for data-bearing classes.

Tinche avatar Sep 30 '22 22:09 Tinche

I've created PR #312

At my $job, I really need the ability to allow subclasses to be structured and unstructured in places where only the parent class is declared. After some time I realized that the manual union approach is not very maintanable in the long run.

Subclassing and inheritance is a supported feature of python and as such I think it also deserves to be well supported in cattrs too. I find that accepting a subclass when the parent class is asked is well in the spirit of inheritance and Liskov substitution principle. In my case I know all the subclasses when I start to use the converter. I understand that using that idea with a plugin system based on inheritance is dangerous, but when you control all the class tree, I really think this is not a misfeature even in the face of data modeling perspective.

matmel avatar Oct 21 '22 21:10 matmel

Ok, I'm fine with adding this as long as it's opt-in (I think it'll be a speed loss in the general case). I'll look over your PR and let's see how to get this in.

Tinche avatar Oct 21 '22 21:10 Tinche

Thanks, convincing you was not too hard 😄

I think there is indeed a speed loss, mainly due to an additional test and optional function call in the case of unstructuring (but you would go through the union disambiguation in case of the "manual" subclass support), and for the structuring there is probably indeed a speed loss as you always go through the union disambiguation mechanism even if you are trying to structure the parent class and you were already in the parent class structuring function.

matmel avatar Oct 21 '22 22:10 matmel