msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Type[X] validation ignores X

Open gsakkis opened this issue 2 years ago • 4 comments

Description

Validating a field annotated as Type[X] (or derivatives e.g. List[Type[X]]) ignores X; it just validates that the input is a type, not a subclass of X.

MCVE:

class Foo:
    pass


class Bar(Foo):
    pass


class Model(msgspec.Struct):
    objs: List[Foo]


class Model2(msgspec.Struct):
    obj_types: List[Type[Foo]]


# succeeds
try:
    msgspec.convert({"objs": [Foo(), Bar(), 3]}, Model)
except msgspec.ValidationError:
    pass
else:
    raise AssertionError("Should have raised ValidationError")

# succeeds
try:
    msgspec.convert({"obj_types": [Foo, Bar, 3]}, Model2)
except msgspec.ValidationError:
    pass
else:
    raise AssertionError("Should have raised ValidationError")

# fails
try:
    msgspec.convert({"obj_types": [Foo, Bar, int]}, Model2)
except msgspec.ValidationError:
    pass
else:
    raise AssertionError("Should have raised ValidationError")

gsakkis avatar Aug 21 '23 11:08 gsakkis

Yes, this is a known limitation - I thought I'd documented it, but a quick grep through the docs doesn't turn anything up.

typing.Type (or type) is not a builtin supported type. As such, msgspec treats it as a custom type.

Custom types are typically supported through extensions, since otherwise msgspec doesn't know how to encode/decode them at all. Part of the contract of implementing a dec_hook is that the hook should return the specified type or error if invalid. As a safety check, msgspec will check the output type of a dec_hook call to ensure that the output is what it should be. For normal types this is just an isinstance call, but for parametrized generics we can't do that - we make do with isinstance(out, obj.__origin__). For Type[Foo] this would only check that the output is an instance of type, not that it's a subclass of Foo, matching the behavior you're seeing.

You however are using msgspec.convert, which was a later addition. msgspec.convert will still make use of a dec_hook if defined, but if not will fallback to the isinstance checks provided above (since a custom object like Foo may be provided directly to msgspec.convert as input). In many cases this isinstance check is sufficient! But as you pointed out here, in the case of a parametrized generic it may let some errors slip through.

Perhaps we should error in msgspec.convert when converting to a custom parametrized generic and no dec_hook is provided? This would make your case above error unless you passed in an explicit dec_hook to handle it.

FWIW, the following works today:

from typing import List, Type

import msgspec


class Foo:
    pass


class Bar(Foo):
    pass


class Model(msgspec.Struct):
    objs: List[Foo]


class Model2(msgspec.Struct):
    obj_types: List[Type[Foo]]


def dec_hook(type_, obj):
    cls = getattr(type_, "__origin__", type_)
    if cls is not None:
        args = getattr(type_, "__args__", ())
    else:
        args = ()

    if cls is type:
        # Handle `Type`/`type` and `Type[base_cls]`/`type[base_cls]` annotations
        if not isinstance(obj, type):
            raise TypeError("Expected `type`")
        if args and not issubclass(obj, args[0]):
            raise TypeError(f"Expected a subclass of {args[0].__qualname__!r}")
        return obj
    elif False:
        pass
        # Could handle more custom types here ...

    if not args:
        # Fallback/default behavior for converting non-generic/parametrized
        # custom types.
        if isinstance(obj, cls):
            return obj
        raise TypeError(f"Expected {cls.__name__!r}")

    # Raise NotImplementedError to signal that this dec_hook doesn't know how
    # to handle this type.
    raise NotImplementedError


# succeeds
try:
    msgspec.convert({"objs": [Foo(), Bar(), 3]}, Model, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
    print(exc)
else:
    raise AssertionError("Should have raised ValidationError")

# succeeds
try:
    msgspec.convert({"obj_types": [Foo, Bar, 3]}, Model2, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
    print(exc)
else:
    raise AssertionError("Should have raised ValidationError")

# fails
try:
    msgspec.convert({"obj_types": [Foo, Bar, int]}, Model2, dec_hook=dec_hook)
except msgspec.ValidationError as exc:
    print(exc)
else:
    raise AssertionError("Should have raised ValidationError")
$ python demo.py
Expected 'Foo' - at `$.objs[2]`
Expected `type` - at `$.obj_types[2]`
Expected a subclass of 'Foo' - at `$.obj_types[2]`

Another option is to add builtin support for type/Type and type[cls]/Type[cls], but to do that I'd need a compelling reason why we should support type/Type instances. What's your reasoning for defining a msgspec.Struct type with a field of type type? Serialization (msgspec's primary purpose)? Or are you trying to use msgspec for runtime validation of other bits of your program, (as suggested in #513).

jcrist avatar Aug 24 '23 06:08 jcrist

@jcrist thanks, I came up with a (simplified) version of your dec_hook to support type[X] annotations but it would be nice if there was builtin support.

Or are you trying to use msgspec for runtime validation of other bits of your program

This; the (ambitious) plan would be to completely supplant pydantic now that Litestar doesn't require it.

gsakkis avatar Aug 24 '23 10:08 gsakkis

Can you comment more on your use case here? Where are you getting type instances as unstructured data that need to be converted to structured data using msgspec.convert? Supporting type builtin is doable, but since it won't be supported by any encodings it'd only be checked in msgspec.convert currently. Just trying to better understand the use case and motivation before deciding if this is worth supporting.

jcrist avatar Aug 24 '23 14:08 jcrist

See this example from Beanie:


class Parent(UnionDoc):  # Union
    class Settings:
        name = "union_doc_collection"  # Collection name
        class_id = "_class_id"  # _class_id is default beanie internal field used to filter children Documents


class One(Document):
    int_field: int = 0
    shared: int = 0        

    class Settings:
        name = "One" # Name used to filer union document 'One', default to class name
        union_doc = Parent

Without going into too much detail, what happens is that the Settings nested class is converted under the hood to an ItemSettings Pydantic model (or ones of its subclasses). ItemSettings.union_doc is annotated as Optional[Type] (although a more precise annotation would be Optional[Type[UnionDoc]].

So it's not about getting type instances as unstructured data but about runtime validation / type checking.

gsakkis avatar Aug 24 '23 17:08 gsakkis