msgspec
msgspec copied to clipboard
Class-based decoder/encoder method for custom types
Description
The current extension callbacks mechanism for non-native types is fine for a single custom type used in one msgspec.Struct
but doesn't quite scale when there are many custom types used by (sometimes arbitrarily) many Structs. Consider the following toy example:
from typing import List
import msgspec
class Double:
def __init__(self, i: int):
self.d = 2 * i
def __repr__(self):
return f"Double(d={self.d})"
class Total:
def __init__(self, *args: int):
self.s = sum(args)
def __repr__(self):
return f"Total(s={self.s})"
class Model1(msgspec.Struct):
double: Double
total: Total
class Model2(msgspec.Struct):
doubles: List[Double]
totals: List[Total]
def dec_hook(cls, obj):
if cls is Double:
return Double(obj)
if cls is Total:
return Total(*obj)
raise TypeError(cls)
print(
msgspec.convert({"double": 3, "total": [2, 3]}, Model1, dec_hook=dec_hook)
)
print(
msgspec.convert(
{"doubles": [3, 4], "totals": [[2, 3], [5, 6, 1]]},
Model2,
dec_hook=dec_hook,
)
)
This works in principle but it has two major issues:
- It couples the decode logic for unrelated types in the same
dec_hook
callback. - It does not couple
dec_hook
with theStruct
, requiring the user to pass it on everyconvert
call.
Both issues could be addressed by introducing special methods (say __msgspec_decode__
/ __msgspec_encode__
) that if implemented on a non natively supported type, they define the (default) decoding/encoding logic for this particular type (i.e. if dec_hook/enc_hook
are not given), something like this:
class Double:
...
@classmethod
def __msgspec_decode__(cls, arg):
return cls(arg)
class Total:
...
@classmethod
def __msgspec_decode__(cls, arg):
return cls(*arg)
# no need for dec_hook
print(msgspec.convert({"double": 3, "total": [2, 3]}, Model1))
Alternatively, there could be a global registry type -> callable
of decoders & encoders, in case it is not possible or desirable to add special __msgspec__
methods to existing classes.
For reference here is how Pydantic (v2) handles custom types.
I just wanted to chime in to ask, is the value here that the magic methods -- __msgspec_decode__
and __msgspec_encode__
-- are common across all projects that use msgspec
?
Otherwise the hook could be generically done with:
def dec_hook(cls, obj):
if hasattr(cls, '__msgspec_decode__'):
return cls.__msgspec_decode__(obj)
raise TypeError(cls)
Most libraries take the class method approach, sometimes using mixins to be used in custom types. Libraries to look at are
- pydantic
- attrs
- mashumaro
- apischema
IMO its good to couple serialization-deserialization (and also validation, but that's harder) tightly with custom types because that makes it easier to build "type libraries" of common data structures which can also depend on each other for serialization.