msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Class-based decoder/encoder method for custom types

Open gsakkis opened this issue 1 year ago • 2 comments

Description

The current extension callbacks mechanism for non-native types is fine for a single custom type used in one msgspec.Struct but doesn't quite scale when there are many custom types used by (sometimes arbitrarily) many Structs. Consider the following toy example:

from typing import List

import msgspec


class Double:
    def __init__(self, i: int):
        self.d = 2 * i

    def __repr__(self):
        return f"Double(d={self.d})"


class Total:
    def __init__(self, *args: int):
        self.s = sum(args)

    def __repr__(self):
        return f"Total(s={self.s})"


class Model1(msgspec.Struct):
    double: Double
    total: Total


class Model2(msgspec.Struct):
    doubles: List[Double]
    totals: List[Total]


def dec_hook(cls, obj):
    if cls is Double:
        return Double(obj)
    if cls is Total:
        return Total(*obj)
    raise TypeError(cls)


print(
    msgspec.convert({"double": 3, "total": [2, 3]}, Model1, dec_hook=dec_hook)
)
print(
    msgspec.convert(
        {"doubles": [3, 4], "totals": [[2, 3], [5, 6, 1]]},
        Model2,
        dec_hook=dec_hook,
    )
)

This works in principle but it has two major issues:

  1. It couples the decode logic for unrelated types in the same dec_hook callback.
  2. It does not couple dec_hook with the Struct, requiring the user to pass it on every convert call.

Both issues could be addressed by introducing special methods (say __msgspec_decode__ / __msgspec_encode__) that if implemented on a non natively supported type, they define the (default) decoding/encoding logic for this particular type (i.e. if dec_hook/enc_hook are not given), something like this:

class Double:
    ...

    @classmethod
    def __msgspec_decode__(cls, arg):
        return cls(arg)


class Total:
    ...

    @classmethod
    def __msgspec_decode__(cls, arg):
        return cls(*arg)


# no need for dec_hook
print(msgspec.convert({"double": 3, "total": [2, 3]}, Model1))

Alternatively, there could be a global registry type -> callable of decoders & encoders, in case it is not possible or desirable to add special __msgspec__ methods to existing classes.

For reference here is how Pydantic (v2) handles custom types.

gsakkis avatar Aug 21 '23 08:08 gsakkis

I just wanted to chime in to ask, is the value here that the magic methods -- __msgspec_decode__ and __msgspec_encode__ -- are common across all projects that use msgspec?

Otherwise the hook could be generically done with:

def dec_hook(cls, obj):
    if hasattr(cls, '__msgspec_decode__'):
        return cls.__msgspec_decode__(obj)
    raise TypeError(cls)

dmckeone avatar Sep 18 '23 22:09 dmckeone

Most libraries take the class method approach, sometimes using mixins to be used in custom types. Libraries to look at are

  • pydantic
  • attrs
  • mashumaro
  • apischema

IMO its good to couple serialization-deserialization (and also validation, but that's harder) tightly with custom types because that makes it easier to build "type libraries" of common data structures which can also depend on each other for serialization.

fungs avatar Dec 14 '23 21:12 fungs