litestar Bug: Pydantic validators don't run for HTTP handler parameters

trafficstars

Description

We recently upgraded all of our Starlite-based services to v2, in the process of which we turned up this rather puzzling issue. Specifically, it appears that Pydantic validators are not run for input parameters to handler methods. This is rather unintuitive for us and caused some headaches in regards to data parsing/validating/munging, as exemplified in the MCVE below.

What we expected with a handler like this:

async def mwe_bad(data: MyType) -> dict:

would be that the type would be instantiated from the incoming request body and the validators would run to do potential extra checks and/or manipulate the data slightly.

This seemingly does not happen, instead it seems to be the case that the class is filled with data but bypasses its own validators. Not sure how that can occur, but it's what we're observing. See the MVCE for a detailed reproduction. This same issue also occurs with Pydantic v1 by the way, both versions of Pydantic behave the same in this case.

Let me know if you need any additional information and I'll be happy to oblige.

URL to code causing the issue

No response

MCVE

"""
Run with:
uvicorn main:app --reload

Works:
curl --location 'http://localhost:8000/good' --header 'Content-Type: application/json' --data '{"picture": "test"}'

Doesn't work:
curl --location 'http://localhost:8000/bad' --header 'Content-Type: application/json' --data '{"picture": "test"}'
"""

from dataclasses import field

from litestar import Litestar, post
from pydantic import field_validator
from pydantic.dataclasses import dataclass


@dataclass
class MyType:
    picture: list[str] = field(default_factory=list)

    @field_validator("picture", mode="before")
    @classmethod
    def single_picture_validator(cls, v):
        if isinstance(v, str):
            return [v]
        return v


@post("/good")
async def mwe_good(data: dict) -> dict:
    data = MyType(**data)
    return data


@post("/bad")
async def mwe_bad(data: MyType) -> dict:
    return data


app = Litestar([mwe_good, mwe_bad], debug=True)

Steps to reproduce

1. Test the working endpoint: `curl --location 'http://localhost:8000/good' --header 'Content-Type: application/json' --data '{"picture": "test"}'`
2. Test the broken endpoint: `curl --location 'http://localhost:8000/bad' --header 'Content-Type: application/json' --data '{"picture": "test"}'`
3. Observe that Step 2 fails with a validation error that does not occur in Step 1, even though the same type is used.

Screenshots

No response

Logs

No response

Litestar Version

2.2.1

Platform

[X] Linux
[X] Mac
[X] Windows
[ ] Other (Please specify in the description above)

[!NOTE]
While we are open for sponsoring on GitHub Sponsors and OpenCollective, we also utilize Polar.sh to engage in pledge-based sponsorship.

Check out all issues funded or available for funding on our Polar.sh Litestar dashboard

If you would like to see an issue prioritized, make a pledge towards it!

We receive the pledge once the issue is completed & verified

This, along with engagement in the community, helps us know which features are a priority to our users.

Nov 02 '23 15:11 timwedde

Just so I understand correctly: You are expecting the handler /good to perform validation but this does not happen?

Nov 02 '23 17:11 provinzkraut

Well, the intent here is that both handlers should behave the same way:

They take in the input {"picture": "test"}
They propagate the output {"picture": ["test"]}

This would normally be the case if you instantiate these Models yourself, you can even try this for yourself in a Notebook or something of the sort, they will run the validator, adjust the string to be a list of strings and give you the properly instantiated class back.

The only time where this doesn't work is in the idiomatic Litestar handler, as shown in the example, where it fails with a validation error. At first glance this may seem correct, but in actuality this behavior differs from the 'normal' usage.

To be clear: The 'good' handler does perform validation, as expected. The 'bad' handler, however, does not, which was a surprise to us.

Nov 02 '23 17:11 timwedde

This would normally be the case if you instantiate these Models yourself, you can even try this for yourself in a Notebook or something of the sort, they will run the validator, adjust the string to be a list of strings and give you the properly instantiated class back.

This does not seem to be the case:

from dataclasses import field
from typing import List

from pydantic import field_validator
from pydantic.dataclasses import dataclass


@dataclass
class MyType:
    picture: List[str] = field(default_factory=list)

    @field_validator("picture", mode="before")
    @classmethod
    def single_picture_validator(cls, v):
        if isinstance(v, str):
            return [v]
        return v


MyType(picture="test")

No error is raised by Pydantic here. It's therefore not surprising that the same doesn't happen if you do this inside of a handler function.

I can see the transformation happening:

assert MyType(picture="test").picture == ["test"]

but no errors being raised.

Nov 02 '23 17:11 provinzkraut

I think we might be miscommunicating, let me try to be as clear as possible: We expect the bad handler (which is currently broken and returns a validation error) to not return that validation error. In fact, it should work just fine, just like the good handler that I included for reference.

It is unsurprising that the good example works for you, that was the plan all along after all. The weird behavior only shows up when used as shown in the 'bad' handler.

Nov 02 '23 17:11 timwedde

I think we might be miscommunicating

Yes, we were, sorry for the misunderstanding! I'll check this out again :)

Nov 02 '23 17:11 provinzkraut

Okay so the reason for this is rather simple; Pydantic is not as strict with its validation as Litestar.

Litestar will throw a validation error before the data even reaches the model. You can confirm this with this example:

from dataclasses import field
from typing import List

from pydantic.dataclasses import dataclass

from litestar.serialization import decode_json


@dataclass
class MyType:
    picture: List[str] = field(default_factory=list)



assert decode_json('{"picture": "test"}', target_type=MyType) == {"picture": ["test"]}

Why does this happen?

As Litestar (and therefore msgspec, which Litestar uses under the hood to perform JSON deserialization and some parts of the validation), MyType is a dataclass, and therefore is validated the same as a plain dataclass. Pydantic kinda "hides" the validation magic it performs there, and requires special knowledge to apply it correctly; It looks like a regular dataclass, but it doesn't behave like one, this is why when it's treated like a regular dataclass, the validation doesn't happen.

Why did this work in Starlite? The reason why this worked in Starlite is that we used Pydantic for all parsing and validation there, and Pydantic gives its own dataclasses special treatment.

Can this be made to work in Litestar?`

Unfortunately, due to the fact mentioned above (the types look like a dataclass but don't behave like one), there is no way to make this work out of the box, and I don't think it ever will (or should). The data does not conform to the schema, which means this ValidationError is correct. This strictness allows Litestar to generate a (more) correct OpenAPI schema and to provide more type and therefore runtime safety.

Should this be made possible in Litestar? (Caution, personal opinion incoming) I don't think we should support this, as it goes somewhat against our core design principles or strict validation and adhering to the schemas given as closely as possible. @litestar-org/maintainers @litestar-org/members please give your opinions on this if you disagree.

Workarounds

As a workaround you can:

Use a pydantic.BaseModel
Embrace the stricter validation :)

Nov 02 '23 18:11 provinzkraut

Thanks for the info, this is very valuable. We have since fixed the problem at the source (as you should when you are able), but there can certainly be cases where you might be dealing with external, fixed systems or the like which would make going this way next to impossible.

As such, while this no longer directly affects us at this point, I would still see it as a decently-sized obstacle to rolling adoption inside any codebase/organization, especially when you might be dealing with i.e. refactoring services one at a time, like we did recently, and can't change the other side.

The main problem, really, was that this behavior was unintuitive: The README claims support for Pydantic v1 and v2, so it stood to reason that a given Pydantic type (be it a BaseModel or a pydantic-flavored dataclass) would work as expected. Evidently it does not. If support for this is not desired, would it at least be useful to potentially make a note of this somewhere in the docs or something? Just to avoid future pitfalls.

Also (this is just me hyptohesizing), given that this is indeed a pydantic-flavored dataclass, would it not be possible to determine that it's a different 'flavor' than a standard dataclass? In fact, pydantic even supplies a method for this exact purpose. Their docs also specifically state that they should be interchangeable when it comes to validation:

If you don't want to use Pydantic's BaseModel you can instead get the same data validation on standard dataclasses.

Imo if Litestar actively and consciously breaks this promise, there should probably be a note of that in an obvious place somewhere.

Considering the publicly available information on this, how can it be that inheriting from BaseModel works in Litestar but using the pydantic-flavored dataclass does not? Imo it might make more sense to pull the pydantic-flavored dataclass behavior more in line with what the docs claim and what the user might expect, instead of diverging from the standard.

For actual standard dataclasses I can definitely understand the argument, of course and I don't mean to throw any shade, in case it comes across that way. Appreciate all that you guys you do, just curious about this decision and whether it's optimal.

Nov 02 '23 18:11 timwedde

The main problem, really, was that this behavior was unintuitive: The README claims support for Pydantic v1 and v2, so it stood to reason that a given Pydantic type (be it a BaseModel or a pydantic-flavored dataclass) would work as expected. Evidently it does not.

I guess that depends on your perspective? I think it does work as expected and, to me, it works like the Pydantic docs describe it; It's interchangeable with a dataclass, and it works just like one.

If support for this is not desired, would it at least be useful to potentially make a note of this somewhere in the docs or something? Just to avoid future pitfalls.

I think there might be some misunderstanding here or maybe I've worded it badly. My point was that there isn't really anything to support here on Litestar's side. Pydantic provides a dataclass that is intended to be a drop-in replacement. This means Litestar should be able to not make a distinction between a Pydantic dataclass and a regular dataclass, which it does not. The issue arises because the Pydantic-provided validation can never happen, since it does not conform to it's own schema. The field in the example would be needed to be annotated as list[str] | str for it to work, and the validators rely on standard dataclass behaviour (e.g. using __post_init__ to perform the validation.

Also (this is just me hyptohesizing), given that this is indeed a pydantic-flavored dataclass, would it not be possible to determine that it's a different 'flavor' than a standard dataclass? In fact, pydantic even supplies a method for this exact purpose.

True, but the type also reports to be a regular dataclass, and as the pydantic docs themselves cite:

Their docs also specifically state that they should be interchangeable when it comes to validation:

But they're not really. This is only true as long as you're using Pydantic only. Once you start treating the dataclass like just a regular dataclass, which is what msgspec does, you'll notice it behaves differently; There's hidden magic attached which is not apparent if you just check if it is a dataclass.

To me, these two ideas conflict. Either it is a drop in replacement, then I should never have to worry about it behaving differently and everything should work as expected if I treat it as a regular dataclass (which is not the case), or it is not a drop-in replacement, in which case special treatment would make sense and it would actually be easy for Litestar to support this, since it would not get picked up as a regular dataclass in the first place.

Imo if Litestar actively and consciously breaks this promise, there should probably be a note of that in an obvious place somewhere.

We don't break that promise. It's Pydantic that does. Litestar is just very strict about how it interprets this promise Pydantic makes; Pydantic says you can get the same validation if you treat them like any other dataclass. We treat them like any other dataclass. But if we do that, two things happen:

This is a validation error because "test" does not conform to a list[str]
No special methods or logic is invoked, because standard dataclasses don't have those

I think it would be more confusing if a standard library dataclass would raise a validation error but the Pydantic one wouldn't. IMO that case would be breaking the "drop-in replacement of dataclasses" more than the current behaviour. If Litestar encounters a dataclass, it will validate it using its type annotations. If Pydantic's dataclasses are a drop-in replacement, the same validation should happen in both cases. What you are asking is for Litestar to actually not treat them as regular dataclasses.

I don't mean to throw any shade, in case it comes across that way. Appreciate all that you guys you do, just curious about this decision and whether it's optimal.

No worries, we're always open for discussion and you certainly make a good point :)

Nov 02 '23 19:11 provinzkraut

Fair enough, I suppose we were coming at this from different directions. For example, I read the pydantic docs as 'this thing behaves like a BaseModel' whereas you're coming at it from 'this thing behaves like a dataclass'. Two ways to read, I suppose.

The 'proper' way to work around this issue might've been to split things into an IncomingType and an OutgoingType in lieu of fitting it into the same one instead and to do the conversion explicitly, it just seemed annoying to do when we had a working, 'simpler' version before.

The hint with the list[str] | str annotation is good and it does indeed make it pass initial validation, this helped me in understanding the first-layer validation that Litestar/msgspec does, so that's excellent. Alas, it still skips the actual validator method, but I guess that's expected from what you're saying.

What really threw me for a loop before this is just that MyType has to be instantiated at some point: If I do this manually, the validators run just fine. So what irked me here was the fact that something in this pipeline also has to instantiate MyType at some point, and that type has validators attached. Alas, after reading the source code some more and digging around a bit, I think I understand how this stuff interacts a little better now. Primarily I can directly repro this behavior with just msgspec as well, which does indeed mean that Litestar correctly does not meddle in this much. Just a shame that msgspec in limited in this regard, I suppose that's part of the price you pay for the performance benefits (and is technically more correct, if somewhat annoying).

In any case, thanks for digging into this a bit. While it's a shame that this use case is no longer supported, this discussion did clear some things up, and this can probably be closed soon, unless some of the other maintainers wanna chime in still.

By the way: I assume that this will then not warrant any kind of note being added to the docs or something as to the limitations of the msgpec-related decoding system? I still feel like the validator-not-running thing could bite someone, even though I now understand why, however that reasoning is definitely not immediately obvious and might be served well by being surfaced a bit more.

Nov 02 '23 20:11 timwedde

By the way: I assume that this will then not warrant any kind of note being added to the docs or something as to the limitations of the msgpec-related decoding system?

I'll give a more detailed answer tomorrow but I just wanted to add here that I think it makes sense to have a note on this particular issue in the docs!

Nov 02 '23 20:11 provinzkraut

@provinzkraut any update here?

Dec 08 '23 17:12 JacobCoffee

I was thinking of refactoring some of my pydantic BaseModels to dataclasses but based on this discussion I'll hold on for now/ever.

The crux of the matter seems to be what "drop in replacement of dataclasses" means. For the record there's no such phrase in the official Pydantic docs - the closest I could find was "... is (mostly) a drop in replacement for dataclasses.dataclass with validation" in a comment. For Litestar (or rather msgspec) it is apparently interpreted more strictly. If fact I can't tell if there's any difference between "drop in replacement of" and "identical to", which makes Pydantic dataclasses pretty much pointless.

Either it is a drop in replacement, then I should never have to worry about it behaving differently and everything should work as expected if I treat it as a regular dataclass (which is not the case), or it is not a drop-in replacement, in which case special treatment would make sense and it would actually be easy for Litestar to support this, since it would not get picked up as a regular dataclass in the first place.

Semantics aside, the latter would definitely be more useful in practice. The only reason to use a Pydantic dataclass instead of a vanilla one is the validation. It is an understandable and defensible position for msgspec to ignore the existence of Pydantic dataclasses (or Pydantic in general). Litestar though already has a pretty decent integration with Pydantic so IMO it would make sense to acknowledge Pydantic dataclasses and treat them separately.

Dec 14 '23 08:12 gsakkis

Litestar though already has a pretty decent integration with Pydantic so IMO it would make sense to acknowledge Pydantic dataclasses and treat them separately.

I'm not sure if this would be possible currently. The reason is that there is no way to tell msgspec to not handle Pydantic datacalsses and instead to let us handle it. The only time msgspec calls the type_encoder/type_decoders are if it's a type that msgspec doesn't know how to serialize/deserialize. I remember @provinzkraut saying that a mechanism to allow us to override the behavior for types that msgspec recognizes is planned, but I don't think it's available yet.

Dec 15 '23 15:12 vkcku

I assume that this will then not warrant any kind of note being added to the docs or something as to the limitations of the msgpec-related decoding system?

I think this should be documented somewhere. Just a simple note that Pydantic dataclasses may not work as expected and maybe just refer to this issue for more details.

Dec 15 '23 15:12 vkcku

litestar litestar copied to clipboard

Bug: Pydantic validators don't run for HTTP handler parameters

Description

URL to code causing the issue

MCVE

Steps to reproduce

Screenshots

Logs

Litestar Version

Platform

litestar
litestar copied to clipboard