Robyn icon indicating copy to clipboard operation
Robyn copied to clipboard

Implement easy access query parameters and request body validation based on query parameters

Open kliwongan opened this issue 2 years ago • 26 comments

Description

This PR fixes https://github.com/sansyrox/robyn/issues/289 and may help with https://github.com/sansyrox/robyn/issues/351 (since msgspec encoding is apparently much faster than the standard json lib) Query validation is optional, may find another way to indicate whether to verify (or not) instead of saving it in the FunctionInfo, or force the change in general (when the "full" type-checking and schema validation is implemented) The schema validation method itself supports nested classes and does not require the user to use external libraries.

The general idea of the fix is: When a request is handled, use the Python GIL to run the steps below using the Python interpreter (possible bottleneck):

  1. Get the signature of the model using the features in the inspect module. The current PR supports custom user-defined (potentially nested) objects in the handler function signature.
  2. Generate a msgspec.Struct based model for schema validation based on either the member variables of the class (if it does not have a user-defined init constructor, or the constructor signature itself)
  3. Compare the model against the received request (by decoding the JSON result into the model object) to see if the schema validation fails; if it does, an InternalServer error is raised.
  4. If schema validation succeeds, we convert the object generated by the model validation into a dictionary and pass it to the handler function as kwargs and get the output of the handler function as usual

Further extensions:

  1. Implement process/thread-safe caching of Encoder and Decoder objects for the msgspec encode and decode calls. Since a call to either method requires some setup, multiple calls can be made faster if we use an Encoder/Decoder object for the respective calls.

kliwongan avatar Jan 17 '23 05:01 kliwongan

Deploy Preview for robyn canceled.

Name Link
Latest commit 437b7de9e3fba1f844b7ec70c68efa4b1eb9def6
Latest deploy log https://app.netlify.com/sites/robyn/deploys/63f666736a810c0008d3e9ee

netlify[bot] avatar Jan 17 '23 05:01 netlify[bot]

Also @kliwongan , apologies for these one liner changes. I will do a thorough review either tomorrow or on Saturday 😄

sansyrox avatar Jan 19 '23 23:01 sansyrox

Hi @kliwongan, and thanks for the PR ! I took a look at this too as it aims to bring an important new feature to Robyn. I agree with you that the current architecture isn't the best as it implies that we use Robyn's Python code in Robyn's Rust code. This could be an issue for separating Robyn's web server and Robyn's web framework in the future AFAIK (see #268), and also decrease performances. I would suggest indeed to do the validation in Rust directly, which will probably and unfortunately be much harder :disappointed: Maybe we could use or be inspired by projects like pydantic-core which aims at migrating pydantic core validation to Rust ? I don't know if @sansyrox agrees with me, but should we rewrite the data validation in Rust, I could try to help you with it :slightly_smiling_face:

AntoineRR avatar Jan 20 '23 21:01 AntoineRR

Hi @kliwongan, and thanks for the PR ! I took a look at this too as it aims to bring an important new feature to Robyn. I agree with you that the current architecture isn't the best as it implies that we use Robyn's Python code in Robyn's Rust code. This could be an issue for separating Robyn's web server and Robyn's web framework in the future AFAIK (see #268), and also decrease performances. I would suggest indeed to do the validation in Rust directly, which will probably and unfortunately be much harder 😞 Maybe we could use or be inspired by projects like pydantic-core which aims at migrating pydantic core validation to Rust ? I don't know if @sansyrox agrees with me, but should we rewrite the data validation in Rust, I could try to help you with it 🙂

I have a buggy branch (not sure if it's my machine or not) that somewhat addresses this problem https://github.com/kliwongan/robyn/tree/buggy-serialize; I essentially store the "validator" function in the FunctionInfo and then call it when necessary; for some reason I am getting a lot of connection errors when running tests (pytest and sending requests using Postman) and the errors are inconsistent, does anyone mind helping me investigate?

EDIT: Buggy branch works now, merged and deleted.

Anyways, if required, I would also love to rewrite the query validation in Rust! I've seen FastAPI's code and I have a good idea on how to apply other kinds of dependency injection as well (at least on the Python side). I agree on perhaps taking inspiration from pydantic-core, it might be a good idea. We can perhaps mix it in with the "validator" concept I have, where the user can also use whatever model validation function they want, which would return the **kwargs that the handler needs. However, I'm not even sure if the query validation should be a part of the server or the framework

kliwongan avatar Jan 20 '23 22:01 kliwongan

Hey @kliwongan , thank you for your great work 😄 You PR shows a lot of hard work. Especially needing you to dive deep in Pydantic and FastAPI's codebase.

But as @AntoineRR(thanks for the review btw 😄 ) suggested, we would love to have the validation written in the Rust side. As it makes the performance slower and makes it harder to segregate the responsibilities in the codebase.

Why we are emphasizing this much on writing the validation in Rust side is because we believe that we do the hard things so that the developers using Robyn don't have to. This way we will be to provide the best experience.

We are more than happy to help you out with the implementation if need be. 😄

Having said that, great work so far! ✨

sansyrox avatar Jan 21 '23 14:01 sansyrox

Speaking of which, @kliwongan are you on our discord channel?

sansyrox avatar Jan 21 '23 14:01 sansyrox

Speaking of which, @kliwongan are you on our discord channel?

I don't believe I am.

kliwongan avatar Jan 21 '23 16:01 kliwongan

Hey @kliwongan , thank you for your great work 😄 You PR shows a lot of hard work. Especially needing you to dive deep in Pydantic and FastAPI's codebase.

But as @AntoineRR(thanks for the review btw 😄 ) suggested, we would love to have the validation written in the Rust side. As it makes the performance slower and makes it harder to segregate the responsibilities in the codebase.

Why we are emphasizing this much on writing the validation in Rust side is because we believe that we do the hard things so that the developers using Robyn don't have to. This way we will be to provide the best experience.

We are more than happy to help you out with the implementation if need be. 😄

Having said that, great work so far! ✨

Would you want to call the Python functions individually in Rust as you previously suggested? (i.e. import libraries like inspect and use callmethod1 on them) or would you work completely in Rust? I am a little confused.

kliwongan avatar Jan 21 '23 16:01 kliwongan

@kliwongan , apologies for the confusion. I meant this. To write as much as possible in rust.

or would you work completely in Rust

I don't believe I am. https://discord.gg/rkERZ5eNU8 , you can join here. For faster feedback if needed 😄

sansyrox avatar Jan 21 '23 17:01 sansyrox

@kliwongan @AntoineRR , I have been thinking about this PR for a long time, and I have a suggestion. Let me know what you folks think of it.

Many people are still having debates on Pydantic vs something like marshmallow or now emerging msgspec. And to be possibly faster than msgspec, we have to implement data validation ourselves, which is a project of its own. As big as Robyn itself.

I was thinking of a direction for adding optional dependencies like how we have implemented jinja2 template support and allow the developer to choose their library for data validation. We will create a nice abstraction that would work with all of them(or at least the ones we support out of the box).

This way, we won't be slowing down Robyn and will not be maintaining and implementing something as complicated as Robyn.

What do you think about this?

sansyrox avatar Feb 06 '23 17:02 sansyrox

Something like a @validate decorator?

sansyrox avatar Feb 06 '23 23:02 sansyrox

@kliwongan @AntoineRR , I have been thinking about this PR for a long time, and I have a suggestion. Let me know what you folks think of it.

Many people are still having debates on Pydantic vs something like marshmallow or now emerging msgspec. And to be possibly faster than msgspec, we have to implement data validation ourselves, which is a project of its own. As big as Robyn itself.

I was thinking of a direction for adding optional dependencies like how we have implemented jinja2 template support and allow the developer to choose their library for data validation. We will create a nice abstraction that would work with all of them(or at least the ones we support out of the box).

This way, we won't be slowing down Robyn and will not be maintaining and implementing something as complicated as Robyn.

What do you think about this?

I can try two approaches: first to implement the validate decorator (to some extent, I have some functionality for this implemented already since I pass in a validator function to validate the incoming request body), and two: spend more time trying to integrate pydantic-core with the existing codebase. I can wrap up the first one in a jiffy, and it can be merged into the master branch, while we can leave the second one for now (as I keep working on it). @sansyrox

kliwongan avatar Feb 07 '23 22:02 kliwongan

I can try two approaches: first to implement the validate decorator (to some extent, I have some functionality for this implemented already since I pass in a validator function to validate the incoming request body), and two: spend more time trying to integrate pydantic-core with the existing codebase. I can wrap up the first one in a jiffy, and it can be merged into the master branch, while we can leave the second one for now (as I keep working on it). @sansyrox

@kliwongan , this sounds good. As long as we can create an abstraction that will allow the developer to choose or override, then I think we can proceed with it.

sansyrox avatar Feb 08 '23 22:02 sansyrox

Hey @kliwongan 👋

I hope all's well. 😄 Is there something that we can do to help you out with the issue?

sansyrox avatar Feb 17 '23 19:02 sansyrox

Hey @kliwongan 👋

I hope all's well. 😄 Is there something that we can do to help you out with the issue?

Hello, I've just been busy recently, and also after pulling recently I've had trouble getting the tests to run successfully; something regarding lifecycle handlers?

kliwongan avatar Feb 21 '23 01:02 kliwongan

@kliwongan , feel free to push your changes on this PR and then maybe we can suggest a few things?

sansyrox avatar Feb 21 '23 10:02 sansyrox

Hey @kliwongan ,

I was taking a deep look in msgspec today and realized that it has some limitations with the typing system.

It does not support mixins, has issues with things like protocols and abcs. It also doesn't support Generics and TypeVars from python types. So, I think we should be using pydantic for data validation.

However, since msgspec is absurdly fast. We should be using it in place of jsonify method.

What do you think?

I think it's a good idea; I can modify the code to use Pydantic similar to how FastAPI does it, although that will also take additional time. Since msgspec is still absurdly fast, we can use it for jsonify, yes. My only problem with that is the over-optimizing aspect; caching one or multiple decoders/serializers since using those objects is faster than calling the functions, and also making this work in a multithreaded context. @sansyrox

kliwongan avatar Feb 22 '23 22:02 kliwongan

Hi, main author of msgspec here! Hope you don't mind me chiming in.

It does not support mixins

This has been resolved (but not yet released).

has issues with things like protocols and abcs

To be clear, the limitation is that you can't create a struct that also subclasses from an abc or Protocol type. However, both of these concepts support "duck-typing" - the defined type doesn't actually need to subclass from the abc/protocol for isinstance/mypy/pyright checking to work properly.

Due to a python limitation we can't support both Protocol and abc types as base classes for msgspec.Struct classes. We'd have to pick one. pydantic chose to support abc base classes, meaning that it can't subclass from protocols. I'd rather not pick favorites, so we support neither.

That said, I've yet to see a strong use case for adding either as a base class for a Struct type, and would be surprised if this limitation is a blocker for any user. Can you comment more on why this matters to you?

It also doesn't support Generics and TypeVars from python types.

We currently don't support generics for Struct types. I expect this limitation to be resolved in the next 2 months. Generics for builtin cpython types are already fully supported. As with the above, I suspect that most users won't ever have a use case for this feature.

So, I think we should be using pydantic for data validation.

Given the current 20-80x decoding speedup when using msgspec instead of pydantic, I'd hope you'd at least support msgspec for handling JSON request bodies. You might do as starlite is considering, and use msgspec for decoding request bodies if the user declares the body as a msgspec.Struct instance (opting in to using msgspec, rather than by default). Using msgspec for validating other data sources (path/query parameters, headers, cookies, ...) will still currently be faster than pydantic, but the performance difference there is much less (2x speedup at most).

If you're considering using msgspec by default for validation, the restrictions I'd think would be most useful to consider would be our current restrictions on union types. We take a much different stance of unions than pydantic, opting to disallow anything that may be considered ambiguous. This eliminates a wide range of accidental usage bugs, but also means we can't support any arbitrary Python type union.


That all said, please don't feel pressured to use msgspec here. If you decide you want to support it in any way I'm happy to answer usage questions and resolve issues. But I'm not a robyn user or dev, so I'll leave the question of whether msgspec makes sense for your use case up to you :).

jcrist avatar Feb 24 '23 21:02 jcrist

Thank you for the detailed explanation @jcrist 😄 This explanation has given me some great points to think upon. I really appreciate your explanation 😄

Also, thank you for addressing my queries. It is great to know that the features are in the roadmap ✨

sansyrox avatar Feb 24 '23 23:02 sansyrox

@kliwongan , I spent the last week using both pydantic and msgspec. I believe both of them are amazing libraries(with their own nuances) and we should support both of them but should allow the user to decide. How could we do that?

I have an idea that will give users more choice.

class CustType(BaseModel/Struct):
...


@app.get("/")
def route(request:CustType):
...

If the route has a custom type(be it msgspec struct or pydantic basemodel) only then we validate otherwise we don't do any kind of validation.

We will be able to provide enough options and also avoid what I like to call decorator proliferation.

sansyrox avatar Mar 04 '23 22:03 sansyrox

@kliwongan , I spent the last week using both pydantic and msgspec. I believe both of them are amazing libraries(with their own nuances) and we should support both of them but should allow the user to decide. How could we do that?

I have an idea that will give users more choice.

class CustType(BaseModel/Struct):
...


@app.get("/")
def route(request:CustType):
...

If the route has a custom type(be it msgspec struct or pydantic basemodel) only then we validate otherwise we don't do any kind of validation.

We will be able to provide enough options and also avoid what I like to call decorator proliferation.

I believe I can make the necessary modifications. As it is right now, base classes are turned into msgspec structs for every request (which I think was inefficient unless we did some form of caching), but now what I can do is check which custom class a type is, and simply run the schema validation functions for that specific object. More modifications might need to be done to allow for true dependency injection and such, but I'll get started on a rudimentary approach first.

kliwongan avatar Mar 04 '23 22:03 kliwongan

More modifications might need to be done to allow for true dependency injection and such

Yep. I agree. But we can add them later 😄

, but I'll get started on a rudimentary approach first.

Sounds good. Let me know if you need anything from me 😄

sansyrox avatar Mar 04 '23 23:03 sansyrox

Hi @kliwongan, do you still have time to work on this?

AntoineRR avatar Apr 24 '23 20:04 AntoineRR

Hi @kliwongan, do you still have time to work on this?

Yes, I'm starting to get a bit more free time now, I will restart working on it soon.

kliwongan avatar Apr 26 '23 05:04 kliwongan

Hey @lecronium#4930 :wave:

So, I have the comments ready for the PR. I had this idea that we can allow multiple libraries for data validation. And if the there are any custom types(not the python default types), we can set validate=True by default unless someone overrides it to False.

e.g.

@app.post("/query_validation_complex", validate=True)
async def test_validation_complex(a: int, b: str, c: MsgSpecClass, d:PydanticClass):
    return jsonify({"a": a, "b": b, "c": c})

Also, we should make the data validation imports optional(maybe). Like we do with templating - To keep the dependencies as minimal as possible.

Let me know what you think of it 😄

sansyrox avatar May 15 '23 19:05 sansyrox

Hey @kliwongan 👋

Welcome back 😄

We have two updated requirements:

  • Allow request members to be accessed through keyword names

-e.g. currently you can do

@app.get('/')
def fx(request):
   ....

What I am aiming for is

@app.get('/')
def fx(request, body, query_params, json, ...):
   ....

And then add validation.

Ideally, I would want the validator to be plug-and-playable and off by default. i.e. the validation overhead should be present only when the developer explicitly turns it on.

And I would want to support both pydantic and msgspec, which can be configurable.

I believe your API implementation is quite generic, which is a great thing. Would love if we can fit the above requirements in the implementation.

sansyrox avatar Apr 22 '24 20:04 sansyrox