Robyn
Robyn copied to clipboard
Implement easy access query parameters and request body validation based on query parameters
Description
This PR fixes https://github.com/sansyrox/robyn/issues/289 and may help with https://github.com/sansyrox/robyn/issues/351 (since msgspec encoding is apparently much faster than the standard json lib) Query validation is optional, may find another way to indicate whether to verify (or not) instead of saving it in the FunctionInfo, or force the change in general (when the "full" type-checking and schema validation is implemented) The schema validation method itself supports nested classes and does not require the user to use external libraries.
The general idea of the fix is: When a request is handled, use the Python GIL to run the steps below using the Python interpreter (possible bottleneck):
- Get the signature of the model using the features in the inspect module. The current PR supports custom user-defined (potentially nested) objects in the handler function signature.
- Generate a msgspec.Struct based model for schema validation based on either the member variables of the class (if it does not have a user-defined init constructor, or the constructor signature itself)
- Compare the model against the received request (by decoding the JSON result into the model object) to see if the schema validation fails; if it does, an InternalServer error is raised.
- If schema validation succeeds, we convert the object generated by the model validation into a dictionary and pass it to the handler function as kwargs and get the output of the handler function as usual
Further extensions:
- Implement process/thread-safe caching of Encoder and Decoder objects for the msgspec encode and decode calls. Since a call to either method requires some setup, multiple calls can be made faster if we use an Encoder/Decoder object for the respective calls.
Deploy Preview for robyn canceled.
Name | Link |
---|---|
Latest commit | 437b7de9e3fba1f844b7ec70c68efa4b1eb9def6 |
Latest deploy log | https://app.netlify.com/sites/robyn/deploys/63f666736a810c0008d3e9ee |
Also @kliwongan , apologies for these one liner changes. I will do a thorough review either tomorrow or on Saturday 😄
Hi @kliwongan, and thanks for the PR ! I took a look at this too as it aims to bring an important new feature to Robyn. I agree with you that the current architecture isn't the best as it implies that we use Robyn's Python code in Robyn's Rust code. This could be an issue for separating Robyn's web server and Robyn's web framework in the future AFAIK (see #268), and also decrease performances. I would suggest indeed to do the validation in Rust directly, which will probably and unfortunately be much harder :disappointed: Maybe we could use or be inspired by projects like pydantic-core which aims at migrating pydantic core validation to Rust ? I don't know if @sansyrox agrees with me, but should we rewrite the data validation in Rust, I could try to help you with it :slightly_smiling_face:
Hi @kliwongan, and thanks for the PR ! I took a look at this too as it aims to bring an important new feature to Robyn. I agree with you that the current architecture isn't the best as it implies that we use Robyn's Python code in Robyn's Rust code. This could be an issue for separating Robyn's web server and Robyn's web framework in the future AFAIK (see #268), and also decrease performances. I would suggest indeed to do the validation in Rust directly, which will probably and unfortunately be much harder 😞 Maybe we could use or be inspired by projects like pydantic-core which aims at migrating pydantic core validation to Rust ? I don't know if @sansyrox agrees with me, but should we rewrite the data validation in Rust, I could try to help you with it 🙂
I have a buggy branch (not sure if it's my machine or not) that somewhat addresses this problem https://github.com/kliwongan/robyn/tree/buggy-serialize; I essentially store the "validator" function in the FunctionInfo and then call it when necessary; for some reason I am getting a lot of connection errors when running tests (pytest and sending requests using Postman) and the errors are inconsistent, does anyone mind helping me investigate?
EDIT: Buggy branch works now, merged and deleted.
Anyways, if required, I would also love to rewrite the query validation in Rust! I've seen FastAPI's code and I have a good idea on how to apply other kinds of dependency injection as well (at least on the Python side). I agree on perhaps taking inspiration from pydantic-core, it might be a good idea. We can perhaps mix it in with the "validator" concept I have, where the user can also use whatever model validation function they want, which would return the **kwargs that the handler needs. However, I'm not even sure if the query validation should be a part of the server or the framework
Hey @kliwongan , thank you for your great work 😄 You PR shows a lot of hard work. Especially needing you to dive deep in Pydantic and FastAPI's codebase.
But as @AntoineRR(thanks for the review btw 😄 ) suggested, we would love to have the validation written in the Rust side. As it makes the performance slower and makes it harder to segregate the responsibilities in the codebase.
Why we are emphasizing this much on writing the validation in Rust side is because we believe that we do the hard things so that the developers using Robyn don't have to. This way we will be to provide the best experience.
We are more than happy to help you out with the implementation if need be. 😄
Having said that, great work so far! ✨
Speaking of which, @kliwongan are you on our discord channel?
Speaking of which, @kliwongan are you on our discord channel?
I don't believe I am.
Hey @kliwongan , thank you for your great work 😄 You PR shows a lot of hard work. Especially needing you to dive deep in Pydantic and FastAPI's codebase.
But as @AntoineRR(thanks for the review btw 😄 ) suggested, we would love to have the validation written in the Rust side. As it makes the performance slower and makes it harder to segregate the responsibilities in the codebase.
Why we are emphasizing this much on writing the validation in Rust side is because we believe that we do the hard things so that the developers using Robyn don't have to. This way we will be to provide the best experience.
We are more than happy to help you out with the implementation if need be. 😄
Having said that, great work so far! ✨
Would you want to call the Python functions individually in Rust as you previously suggested? (i.e. import libraries like inspect and use callmethod1 on them) or would you work completely in Rust? I am a little confused.
@kliwongan , apologies for the confusion. I meant this. To write as much as possible in rust.
or would you work completely in Rust
I don't believe I am. https://discord.gg/rkERZ5eNU8 , you can join here. For faster feedback if needed 😄
@kliwongan @AntoineRR , I have been thinking about this PR for a long time, and I have a suggestion. Let me know what you folks think of it.
Many people are still having debates on Pydantic vs something like marshmallow or now emerging msgspec. And to be possibly faster than msgspec, we have to implement data validation ourselves, which is a project of its own. As big as Robyn itself.
I was thinking of a direction for adding optional dependencies like how we have implemented jinja2 template support and allow the developer to choose their library for data validation. We will create a nice abstraction that would work with all of them(or at least the ones we support out of the box).
This way, we won't be slowing down Robyn and will not be maintaining and implementing something as complicated as Robyn.
What do you think about this?
Something like a @validate
decorator?
@kliwongan @AntoineRR , I have been thinking about this PR for a long time, and I have a suggestion. Let me know what you folks think of it.
Many people are still having debates on Pydantic vs something like marshmallow or now emerging msgspec. And to be possibly faster than msgspec, we have to implement data validation ourselves, which is a project of its own. As big as Robyn itself.
I was thinking of a direction for adding optional dependencies like how we have implemented jinja2 template support and allow the developer to choose their library for data validation. We will create a nice abstraction that would work with all of them(or at least the ones we support out of the box).
This way, we won't be slowing down Robyn and will not be maintaining and implementing something as complicated as Robyn.
What do you think about this?
I can try two approaches: first to implement the validate decorator (to some extent, I have some functionality for this implemented already since I pass in a validator function to validate the incoming request body), and two: spend more time trying to integrate pydantic-core with the existing codebase. I can wrap up the first one in a jiffy, and it can be merged into the master branch, while we can leave the second one for now (as I keep working on it). @sansyrox
I can try two approaches: first to implement the validate decorator (to some extent, I have some functionality for this implemented already since I pass in a validator function to validate the incoming request body), and two: spend more time trying to integrate pydantic-core with the existing codebase. I can wrap up the first one in a jiffy, and it can be merged into the master branch, while we can leave the second one for now (as I keep working on it). @sansyrox
@kliwongan , this sounds good. As long as we can create an abstraction that will allow the developer to choose or override, then I think we can proceed with it.
Hey @kliwongan 👋
I hope all's well. 😄 Is there something that we can do to help you out with the issue?
Hey @kliwongan 👋
I hope all's well. 😄 Is there something that we can do to help you out with the issue?
Hello, I've just been busy recently, and also after pulling recently I've had trouble getting the tests to run successfully; something regarding lifecycle handlers?
@kliwongan , feel free to push your changes on this PR and then maybe we can suggest a few things?
Hey @kliwongan ,
I was taking a deep look in msgspec today and realized that it has some limitations with the typing system.
It does not support mixins, has issues with things like protocols and abcs. It also doesn't support Generics and TypeVars from python types. So, I think we should be using
pydantic
for data validation.However, since msgspec is absurdly fast. We should be using it in place of jsonify method.
What do you think?
I think it's a good idea; I can modify the code to use Pydantic similar to how FastAPI does it, although that will also take additional time. Since msgspec is still absurdly fast, we can use it for jsonify, yes. My only problem with that is the over-optimizing aspect; caching one or multiple decoders/serializers since using those objects is faster than calling the functions, and also making this work in a multithreaded context. @sansyrox
Hi, main author of msgspec here! Hope you don't mind me chiming in.
It does not support mixins
This has been resolved (but not yet released).
has issues with things like protocols and abcs
To be clear, the limitation is that you can't create a struct that also subclasses from an abc
or Protocol
type. However, both of these concepts support "duck-typing" - the defined type doesn't actually need to subclass from the abc/protocol for isinstance
/mypy
/pyright
checking to work properly.
Due to a python limitation we can't support both Protocol
and abc
types as base classes for msgspec.Struct
classes. We'd have to pick one. pydantic
chose to support abc
base classes, meaning that it can't subclass from protocols. I'd rather not pick favorites, so we support neither.
That said, I've yet to see a strong use case for adding either as a base class for a Struct
type, and would be surprised if this limitation is a blocker for any user. Can you comment more on why this matters to you?
It also doesn't support Generics and TypeVars from python types.
We currently don't support generics for Struct types. I expect this limitation to be resolved in the next 2 months. Generics for builtin cpython types are already fully supported. As with the above, I suspect that most users won't ever have a use case for this feature.
So, I think we should be using pydantic for data validation.
Given the current 20-80x decoding speedup when using msgspec
instead of pydantic, I'd hope you'd at least support msgspec for handling JSON request bodies. You might do as starlite is considering, and use msgspec
for decoding request bodies if the user declares the body as a msgspec.Struct
instance (opting in to using msgspec
, rather than by default). Using msgspec
for validating other data sources (path/query parameters, headers, cookies, ...) will still currently be faster than pydantic, but the performance difference there is much less (2x speedup at most).
If you're considering using msgspec
by default for validation, the restrictions I'd think would be most useful to consider would be our current restrictions on union types. We take a much different stance of unions than pydantic
, opting to disallow anything that may be considered ambiguous. This eliminates a wide range of accidental usage bugs, but also means we can't support any arbitrary Python type union.
That all said, please don't feel pressured to use msgspec
here. If you decide you want to support it in any way I'm happy to answer usage questions and resolve issues. But I'm not a robyn user or dev, so I'll leave the question of whether msgspec
makes sense for your use case up to you :).
Thank you for the detailed explanation @jcrist 😄 This explanation has given me some great points to think upon. I really appreciate your explanation 😄
Also, thank you for addressing my queries. It is great to know that the features are in the roadmap ✨
@kliwongan , I spent the last week using both pydantic and msgspec. I believe both of them are amazing libraries(with their own nuances) and we should support both of them but should allow the user to decide. How could we do that?
I have an idea that will give users more choice.
class CustType(BaseModel/Struct):
...
@app.get("/")
def route(request:CustType):
...
If the route has a custom type(be it msgspec struct or pydantic basemodel) only then we validate otherwise we don't do any kind of validation.
We will be able to provide enough options and also avoid what I like to call decorator proliferation.
@kliwongan , I spent the last week using both pydantic and msgspec. I believe both of them are amazing libraries(with their own nuances) and we should support both of them but should allow the user to decide. How could we do that?
I have an idea that will give users more choice.
class CustType(BaseModel/Struct): ... @app.get("/") def route(request:CustType): ...
If the route has a custom type(be it msgspec struct or pydantic basemodel) only then we validate otherwise we don't do any kind of validation.
We will be able to provide enough options and also avoid what I like to call decorator proliferation.
I believe I can make the necessary modifications. As it is right now, base classes are turned into msgspec structs for every request (which I think was inefficient unless we did some form of caching), but now what I can do is check which custom class a type is, and simply run the schema validation functions for that specific object. More modifications might need to be done to allow for true dependency injection and such, but I'll get started on a rudimentary approach first.
More modifications might need to be done to allow for true dependency injection and such
Yep. I agree. But we can add them later 😄
, but I'll get started on a rudimentary approach first.
Sounds good. Let me know if you need anything from me 😄
Hi @kliwongan, do you still have time to work on this?
Hi @kliwongan, do you still have time to work on this?
Yes, I'm starting to get a bit more free time now, I will restart working on it soon.
Hey @lecronium#4930 :wave:
So, I have the comments ready for the PR. I had this idea that we can allow multiple libraries for data validation. And if the there are any custom types(not the python default types), we can set validate=True
by default unless someone overrides it to False
.
e.g.
@app.post("/query_validation_complex", validate=True)
async def test_validation_complex(a: int, b: str, c: MsgSpecClass, d:PydanticClass):
return jsonify({"a": a, "b": b, "c": c})
Also, we should make the data validation imports optional(maybe). Like we do with templating - To keep the dependencies as minimal as possible.
Let me know what you think of it 😄
Hey @kliwongan 👋
Welcome back 😄
We have two updated requirements:
- Allow request members to be accessed through keyword names
-e.g. currently you can do
@app.get('/')
def fx(request):
....
What I am aiming for is
@app.get('/')
def fx(request, body, query_params, json, ...):
....
And then add validation.
Ideally, I would want the validator to be plug-and-playable and off by default. i.e. the validation overhead should be present only when the developer explicitly turns it on.
And I would want to support both pydantic and msgspec, which can be configurable.
I believe your API implementation is quite generic, which is a great thing. Would love if we can fit the above requirements in the implementation.