msgspec
msgspec copied to clipboard
Type[X] validation ignores X
Description
Validating a field annotated as Type[X] (or derivatives e.g. List[Type[X]]) ignores X; it just validates that the input is a type, not a subclass of X.
MCVE:
class Foo:
pass
class Bar(Foo):
pass
class Model(msgspec.Struct):
objs: List[Foo]
class Model2(msgspec.Struct):
obj_types: List[Type[Foo]]
# succeeds
try:
msgspec.convert({"objs": [Foo(), Bar(), 3]}, Model)
except msgspec.ValidationError:
pass
else:
raise AssertionError("Should have raised ValidationError")
# succeeds
try:
msgspec.convert({"obj_types": [Foo, Bar, 3]}, Model2)
except msgspec.ValidationError:
pass
else:
raise AssertionError("Should have raised ValidationError")
# fails
try:
msgspec.convert({"obj_types": [Foo, Bar, int]}, Model2)
except msgspec.ValidationError:
pass
else:
raise AssertionError("Should have raised ValidationError")
Yes, this is a known limitation - I thought I'd documented it, but a quick grep through the docs doesn't turn anything up.
typing.Type (or type) is not a builtin supported type. As such, msgspec treats it as a custom type.
Custom types are typically supported through extensions, since otherwise msgspec doesn't know how to encode/decode them at all. Part of the contract of implementing a dec_hook is that the hook should return the specified type or error if invalid. As a safety check, msgspec will check the output type of a dec_hook call to ensure that the output is what it should be. For normal types this is just an isinstance call, but for parametrized generics we can't do that - we make do with isinstance(out, obj.__origin__). For Type[Foo] this would only check that the output is an instance of type, not that it's a subclass of Foo, matching the behavior you're seeing.
You however are using msgspec.convert, which was a later addition. msgspec.convert will still make use of a dec_hook if defined, but if not will fallback to the isinstance checks provided above (since a custom object like Foo may be provided directly to msgspec.convert as input). In many cases this isinstance check is sufficient! But as you pointed out here, in the case of a parametrized generic it may let some errors slip through.
Perhaps we should error in msgspec.convert when converting to a custom parametrized generic and no dec_hook is provided? This would make your case above error unless you passed in an explicit dec_hook to handle it.
FWIW, the following works today:
from typing import List, Type
import msgspec
class Foo:
pass
class Bar(Foo):
pass
class Model(msgspec.Struct):
objs: List[Foo]
class Model2(msgspec.Struct):
obj_types: List[Type[Foo]]
def dec_hook(type_, obj):
cls = getattr(type_, "__origin__", type_)
if cls is not None:
args = getattr(type_, "__args__", ())
else:
args = ()
if cls is type:
# Handle `Type`/`type` and `Type[base_cls]`/`type[base_cls]` annotations
if not isinstance(obj, type):
raise TypeError("Expected `type`")
if args and not issubclass(obj, args[0]):
raise TypeError(f"Expected a subclass of {args[0].__qualname__!r}")
return obj
elif False:
pass
# Could handle more custom types here ...
if not args:
# Fallback/default behavior for converting non-generic/parametrized
# custom types.
if isinstance(obj, cls):
return obj
raise TypeError(f"Expected {cls.__name__!r}")
# Raise NotImplementedError to signal that this dec_hook doesn't know how
# to handle this type.
raise NotImplementedError
# succeeds
try:
msgspec.convert({"objs": [Foo(), Bar(), 3]}, Model, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
print(exc)
else:
raise AssertionError("Should have raised ValidationError")
# succeeds
try:
msgspec.convert({"obj_types": [Foo, Bar, 3]}, Model2, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
print(exc)
else:
raise AssertionError("Should have raised ValidationError")
# fails
try:
msgspec.convert({"obj_types": [Foo, Bar, int]}, Model2, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
print(exc)
else:
raise AssertionError("Should have raised ValidationError")
$ python demo.py
Expected 'Foo' - at `$.objs[2]`
Expected `type` - at `$.obj_types[2]`
Expected a subclass of 'Foo' - at `$.obj_types[2]`
Another option is to add builtin support for type/Type and type[cls]/Type[cls], but to do that I'd need a compelling reason why we should support type/Type instances. What's your reasoning for defining a msgspec.Struct type with a field of type type? Serialization (msgspec's primary purpose)? Or are you trying to use msgspec for runtime validation of other bits of your program, (as suggested in #513).
@jcrist thanks, I came up with a (simplified) version of your dec_hook to support type[X] annotations but it would be nice if there was builtin support.
Or are you trying to use msgspec for runtime validation of other bits of your program
This; the (ambitious) plan would be to completely supplant pydantic now that Litestar doesn't require it.
Can you comment more on your use case here? Where are you getting type instances as unstructured data that need to be converted to structured data using msgspec.convert? Supporting type builtin is doable, but since it won't be supported by any encodings it'd only be checked in msgspec.convert currently. Just trying to better understand the use case and motivation before deciding if this is worth supporting.
See this example from Beanie:
class Parent(UnionDoc): # Union
class Settings:
name = "union_doc_collection" # Collection name
class_id = "_class_id" # _class_id is default beanie internal field used to filter children Documents
class One(Document):
int_field: int = 0
shared: int = 0
class Settings:
name = "One" # Name used to filer union document 'One', default to class name
union_doc = Parent
Without going into too much detail, what happens is that the Settings nested class is converted under the hood to an ItemSettings Pydantic model (or ones of its subclasses). ItemSettings.union_doc is annotated as Optional[Type] (although a more precise annotation would be Optional[Type[UnionDoc]].
So it's not about getting type instances as unstructured data but about runtime validation / type checking.