gino icon indicating copy to clipboard operation
gino copied to clipboard

Any way to validate request data using ORM?

Open bunyk opened this issue 7 years ago • 10 comments

I have a db.Model called Widget with approximately 10 fields of different types. And I create rows in database using

    await Widget.create(**request.json)

Everything is fine while request payload is properly formatted. But then some field has wrong type, I get this exception:

 asyncpg.exceptions.DataError: invalid input for query argument $4: {} (expected str, got dict)

From this is not clear which field had wrong type. Is there any method to compare dictionary to db.Model and check which fields has wrong types? Should I use some external library?

bunyk avatar Nov 07 '18 16:11 bunyk

For such errors, we can put the sql statement arguments in the error output. An easier way is that you can set db echo to True, so all the sql statement and arguments should be printed to output.

BTW, it's usually recommended to validate user input before ask db to do anything, where the input type/value can be checked, because there can be malicious users wanting to harm the system.

wwwjfy avatar Nov 07 '18 17:11 wwwjfy

@bunyk try https://marshmallow.readthedocs.io/en/3.0/ for schema validation

jekel avatar Nov 07 '18 17:11 jekel

So, the recommended way is to define schema twice? Once for database, once for request validation?

bunyk avatar Nov 07 '18 18:11 bunyk

While thinking about this, I came to idea that probably I could use some introspection on db.Model to generate JSON Schema from it.

bunyk avatar Nov 07 '18 18:11 bunyk

@bunyk i think, that will be greate to have PR with this feature @wwwjfy what do you think about it?

jekel avatar Nov 08 '18 18:11 jekel

There is library https://github.com/podhmo/alchemyjsonschema, it gives me error "AttributeError: 'Table' object has no attribute 'mapper'", so I did it like this:

import json
from jsonschema import validate
from gino import Gino

db = Gino()

# Example model
class Page(db.Model):
    __tablename__ = 'pages'

    url_slug = db.Column(db.Unicode(length=20), primary_key=True)
    title = db.Column(db.Unicode(length=255))
    visible = db.Column(db.Boolean())
    config = db.Column(db.JSON())


# Convert model to jsonschema 
def get_json_schema(model):
    properties = {}
    required = []
    for field in model.__table__.get_children():
        if not field.nullable:
            required.append(field.name)
        json_type = python_type_to_json(field.type.python_type)
        field_schema = {
            "type": json_type,
        }
        if hasattr(field.type, 'length'):
            field_schema['maxLength'] = field.type.length
        properties[field.name] = field_schema

    return {
        "type": "object",
        "additionalProperties": False,
        "properties": properties,
        "required": required,
    }

def python_type_to_json(t):
    if t == str:
        return "string"
    if t == int:
        return "number"
    if t == bool:
        return "boolean"
    if t == dict:
        return "object"
    raise ValueError("Unknown type: %s" % t)


# test it
if __name__ == '__main__':
    schema = get_json_schema(Page)
    print(json.dumps(schema, indent='  '))
    # {
    #   "type": "object",
    #   "additionalProperties": false,
    #   "properties": {
    #     "url_slug": {
    #       "type": "string",
    #       "maxLength": 20
    #     },
    #     "title": {
    #       "type": "string",
    #       "maxLength": 255
    #     },
    #     "visible": {
    #       "type": "boolean"
    #     },
    #     "config": {
    #       "type": "object"
    #     }
    #   },
    #   "required": [
    #     "url_slug"
    #   ]
    # }

    # this is ok:
    validate({"title": "test", "url_slug": "test", "visible": True}, schema)

    # this gives ValidationError: 'testtest...est' is too long
    validate({"title": 22, "url_slug": "test" * 20, "visible": True}, schema)

But then, as I have only one model yet, I just added that jsonschema as constant, so this is just example code, not used in production yet, and not supporting all field types.

bunyk avatar Nov 09 '18 12:11 bunyk

Can't say it's not related, but generating json schema is a bit out of scope for an ORM framework, which I think may be more suitable for a web framework. I guess we can have a helper script/cli or something similar for that.

I checked alchemyjsonschema, which has a deep coupling with sqlalchemy ORM, and can't be apply to gino models easily.

wwwjfy avatar Nov 09 '18 13:11 wwwjfy

@bunyk like the others before me, this seems out of scope.

The verification of user sourced data is always done before pushing into a database. The SQL injection attack vector exists because developers are not following industry standards that were created to protect from these things.

Perhaps checkout wtfforms.

Additionally, coupling database schema with data input from the user is almost always a bad idea.

Here are a few things to consider:

How do you, for example, let the user provide a type 4 UUID? a text field? in a postgresql database it is a special field type. what if they get the format wrong? What if the database schema requires field b, c to be non null only when field a is null? How would you validate that?

Perhaps your data domains are really simple and don't require this.

could you get by with a simple method like this (untested / unverified code)?

def validate(data, validation):
    """Validates the data according to validation.  May raise TypeError, KeyError."""
    for value_key, value_type in validation:
        value = data[value_key]  # possible KeyError
        if isinstance(value, value_type) is False:
            raise TypeError(value_key)

validate({'a': 12, 'b': 'test'}, [('a', int), ('b', str)])

The tuple could be easily expanded with a 3rd field for validator method which could return a bool.
If the validator method is not None run it.  If the returned value is not True raise ValueError(value_key).
Watch out for generic classes in `typing` module as they don't evaluate properly with isinstance().

skewty avatar Nov 12 '18 16:11 skewty

Agree with others. User data should always be verified before any DB transactions happen. We are using decorator with simple https://pypi.org/project/jsonschema/ for aiohttp request validation

# schemas.py

patch_schema = {
    'type': 'object',
    'properties': {
        'data': {'type': 'object'},
        'name': {'type': 'string'},
        'thumbnails': {'type': 'array', 'items': {'type': 'string'}}
    },
    'required': ['data', 'name']
}
# validate.py

import ujson

from aiohttp import web
from jsonschema.exceptions import ValidationError
from jsonschema.validators import validate


def validate_dict(data: dict, schema: dict):
    try:
        validate(data, schema)
    except ValidationError as validation_error:
        return web.json_response({'error': f'Invalid json: {validation_error.message}'}, status=400)
    except ValueError:
        return web.json_response({'error': 'Invalid json'}, status=400)


def validate_json(schema):
    def wrapper(func):
        async def func_wrapper(*args):
            request = args[0].request
            body = await request.json(loads=ujson.loads)
            validation_result = validate_dict(body, schema)
            if validation_result is not None:
                return validation_result
            return await func(*args)

        return func_wrapper

    return wrapper
# views.py
from schemas import patch_schema
from validate import validate_json

# and here goes actual view with decorator
@validate_json(schema=patch_schema)
async def smth(self):
    pass

sergeibershadsky avatar Nov 15 '18 15:11 sergeibershadsky

pls also try marshmallow-sqlalchemy

class BookSchema(TableSchema):
    class Meta:
        table = Book.__table__

It works fine for me with gino, but I only tested with plain models (without releationship)

Reskov avatar Nov 27 '18 03:11 Reskov