msgspec
msgspec copied to clipboard
Support unions of custom types
Hi there, I was trying out msgspec as an alternative for attrs with cattrs, but I ran into a roadblock with a union type that I have not had a problem with before. Specifically, I have a message type with field for IP addresses, since python differentiates between IPv4 and IPv6 addresses, but I do not, I need to use a union like so:
import ipaddress
import msgspec
IPAdress = ipaddress.IPv4Address | ipaddress.IPv6Address
class Message(msgspec.Struct):
ip: IPAdress
msg = Message(ipaddress.ip_address("127.0.0.1"))
def enc_hook(obj):
match obj:
case ipaddress.IPv4Address() as value:
return {"__ipaddress__": str(value)}
case ipaddress.IPv6Address() as value:
return {"__ipaddress__": str(value)}
case _:
raise TypeError(f"Cannot encode objects of type {type(obj)}")
def dec_hook(type, obj):
# not quite done, I think msgspec raises before reaching this.
if type in (ipaddress.IPv4Address, ipaddress.IPv6Address):
return ipaddress.ip_address(obj)
When I then use this code, I can encode:
>>> d = msgspec.msgpack.encode(msg, enc_hook=enc_hook)
>>> d
b'\x81\xa2ip\x81\xad__ipaddress__\xa9127.0.0.1'
But trying to decode the encoded result back again results in the following error message:
>>> msgspec.msgpack.decode(d, type=Message)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Type unions may not contain more than one custom type - type `ipaddress.IPv4Address | ipaddress.IPv6Address` is not supported
Having read the docs about how to create tagged unions, I can only see it working for subclasses using msgspec and not for classes that already exists.
In this specific case, both types of the union can share the same logic for serialization and deserialization:
- Serialization:
str - Deserialization:
ipaddress.ip_address
As an aside: I also have the same need for dealing with IP networks, where deserialization becomes ipaddress.ip_network:
IPNetwork = ipaddress.IPv4Network | ipaddress.IPv6Network
It could be that the shared logic makes it a little easier to construct something that makes it possible to handle unions like these.
Thanks for opening this issue.
It's unfortunate that the base type of IPv4Address and IPv6Address isn't technically public. It's mentioned in the original ipaddress PEP, and other validation libraries like Pydantic make use of it, but it has a leading underscore so I hesitate to recommend using it. Still, it's currently the only way to get what you want to work:
import ipaddress
import msgspec
# Use the (undocumented) base type for IPv4Address & IPv6Address
IPAddress = ipaddress._BaseAddress
class Message(msgspec.Struct):
ip: IPAddress
def enc_hook(obj):
if isinstance(obj, IPAddress):
return str(obj)
raise TypeError(f"Cannot encode objects of type {type(obj)}")
def dec_hook(type, obj):
if type is IPAddress:
return ipaddress.ip_address(obj)
raise TypeError(f"Cannot decode objects of type {type}")
enc = msgspec.msgpack.Encoder(enc_hook=enc_hook)
dec = msgspec.msgpack.Decoder(Message, dec_hook=dec_hook)
msg1 = Message(ipaddress.ip_address("127.0.0.1"))
msg2 = Message(ipaddress.ip_address("2001:db8::"))
s1 = enc.encode(msg1)
s2 = enc.encode(msg2)
assert dec.decode(s1) == msg1
assert dec.decode(s2) == msg2
For network types this would be ipaddress._BaseNetwork. These private classes are used by several prominent libraries, and are unlikely to change, but are technically private so you may not want to use them. Up to you.
The reason we don't support unions of custom types on decode is that the serialized form will need some way to distinguish between the decoded types (tagged unions do this for the builtin struct types). We could remove this check, but then I'm not sure what the dec_hook interface should be?
Currently this hook has the signature dec_hook(typ: Type, obj: Any) -> Any, where typ is the type to decode into, and obj is the "unstructured" representation. If we support unions of custom types, what should be passed in to typ?
- A
typing.Union? - A
types.UnionType? (this is the output ofint | float, as opposed toUnion[int, float]) - A
tupleof the types? - A
frozensetof the types?
Right now I'm leaning towards types.UnionType for 3.10+ and typing.Union for the rest. The downside is that most users are likely to be unfamiliar with how to interact with types.UnionType at runtime, while a tuple would be a much more familiar interface. None of these options seem that great to me, this is a bit of a messy problem to solve.
Thanks for the input, it works, kind of inconvenient for me that the baseclass is "private", but that is just how it is :shrug:
With regards to how to potentially support unions, I think I have personally been hitting my head against some reasonably sophisticated typing problems that I would not have a problem with using types.Union, but then again that might also disqualify my opinion a bit.
Perhaps it could be possible to experiment with the API ? Either by adding a new argument to control how unions gets passed to the decoder ? That could allow experiment with which one is more ergonomic to work with. While pretty ugly, it could be possible to have something like:
dec = msgspec.msgpack.Decoder(Message, dec_hook=dec_hook, _union_as="tuple")
The kind of ugly leading underscore could be used to communicate that this is experimental.
Alternatively maybe some variants of the Decoder class could also work, perhaps UnionDecoder ? Again providing a space to figure out the better API without breaking too much in the existing code.
I think I have personally been hitting my head against some reasonably sophisticated typing problems that I would not have a problem with using types.Union
Can you provide some examples of problems you've run into/types that you think should be easier to handle here?
I've been thinking about ripping out and replacing the current extension mechanisms (enc_hook/dec_hook/ext_hook) with a more powerful dispatch mechanism, so now would be the time to collect examples of behaviors we'd want to support.
Can you provide some examples of problems you've run into/types that you think should be easier to handle here?
Nothing specific to msgspec, but just generally using unions at runtime to determine which types an encoded value could be decoded as.
So not many cases to collect there (sorry), except for the original problem with IP addresses.
Are there other custom types/combinations of custom types that you're likely to want to use beyond just IPv4Address/IPv6Address/IPv4Address | IPv6Address? I have a plan that would make IPv4Address | IPv6Address just work transparently, if this is all you're after then that's covered.
Are there other custom types/combinations of custom types that you're likely to want to use beyond just
IPv4Address/IPv6Address/IPv4Address | IPv6Address? I have a plan that would makeIPv4Address | IPv6Addressjust work transparently, if this is all you're after then that's covered.
I have a similar issue here:
class Response(msgspec.Struct):
vector: np.ndarray | list[float]
dec_hook doesn't work here. Decoders like msgspec.json.Decoder(type=Response) will raise "TypeError: Type unions containing a custom type may not contain any additional types other than None - type list[float] | numpy.ndarray is not supported."
Numpy array is a common type used in machine learning applications. I know it's not a good idea to convert it to a list[float] and encode it with JSON. Just mention it here if you want to know more examples.