Any way to validate request data using ORM?
I have a db.Model called Widget with approximately 10 fields of different types. And I create rows in database using
await Widget.create(**request.json)
Everything is fine while request payload is properly formatted. But then some field has wrong type, I get this exception:
asyncpg.exceptions.DataError: invalid input for query argument $4: {} (expected str, got dict)
From this is not clear which field had wrong type. Is there any method to compare dictionary to db.Model and check which fields has wrong types? Should I use some external library?
For such errors, we can put the sql statement arguments in the error output. An easier way is that you can set db echo to True, so all the sql statement and arguments should be printed to output.
BTW, it's usually recommended to validate user input before ask db to do anything, where the input type/value can be checked, because there can be malicious users wanting to harm the system.
@bunyk try https://marshmallow.readthedocs.io/en/3.0/ for schema validation
So, the recommended way is to define schema twice? Once for database, once for request validation?
While thinking about this, I came to idea that probably I could use some introspection on db.Model to generate JSON Schema from it.
@bunyk i think, that will be greate to have PR with this feature @wwwjfy what do you think about it?
There is library https://github.com/podhmo/alchemyjsonschema, it gives me error "AttributeError: 'Table' object has no attribute 'mapper'", so I did it like this:
import json
from jsonschema import validate
from gino import Gino
db = Gino()
# Example model
class Page(db.Model):
__tablename__ = 'pages'
url_slug = db.Column(db.Unicode(length=20), primary_key=True)
title = db.Column(db.Unicode(length=255))
visible = db.Column(db.Boolean())
config = db.Column(db.JSON())
# Convert model to jsonschema
def get_json_schema(model):
properties = {}
required = []
for field in model.__table__.get_children():
if not field.nullable:
required.append(field.name)
json_type = python_type_to_json(field.type.python_type)
field_schema = {
"type": json_type,
}
if hasattr(field.type, 'length'):
field_schema['maxLength'] = field.type.length
properties[field.name] = field_schema
return {
"type": "object",
"additionalProperties": False,
"properties": properties,
"required": required,
}
def python_type_to_json(t):
if t == str:
return "string"
if t == int:
return "number"
if t == bool:
return "boolean"
if t == dict:
return "object"
raise ValueError("Unknown type: %s" % t)
# test it
if __name__ == '__main__':
schema = get_json_schema(Page)
print(json.dumps(schema, indent=' '))
# {
# "type": "object",
# "additionalProperties": false,
# "properties": {
# "url_slug": {
# "type": "string",
# "maxLength": 20
# },
# "title": {
# "type": "string",
# "maxLength": 255
# },
# "visible": {
# "type": "boolean"
# },
# "config": {
# "type": "object"
# }
# },
# "required": [
# "url_slug"
# ]
# }
# this is ok:
validate({"title": "test", "url_slug": "test", "visible": True}, schema)
# this gives ValidationError: 'testtest...est' is too long
validate({"title": 22, "url_slug": "test" * 20, "visible": True}, schema)
But then, as I have only one model yet, I just added that jsonschema as constant, so this is just example code, not used in production yet, and not supporting all field types.
Can't say it's not related, but generating json schema is a bit out of scope for an ORM framework, which I think may be more suitable for a web framework. I guess we can have a helper script/cli or something similar for that.
I checked alchemyjsonschema, which has a deep coupling with sqlalchemy ORM, and can't be apply to gino models easily.
@bunyk like the others before me, this seems out of scope.
The verification of user sourced data is always done before pushing into a database. The SQL injection attack vector exists because developers are not following industry standards that were created to protect from these things.
Perhaps checkout wtfforms.
Additionally, coupling database schema with data input from the user is almost always a bad idea.
Here are a few things to consider:
How do you, for example, let the user provide a type 4 UUID? a text field? in a postgresql database it is a special field type. what if they get the format wrong? What if the database schema requires field b, c to be non null only when field a is null? How would you validate that?
Perhaps your data domains are really simple and don't require this.
could you get by with a simple method like this (untested / unverified code)?
def validate(data, validation):
"""Validates the data according to validation. May raise TypeError, KeyError."""
for value_key, value_type in validation:
value = data[value_key] # possible KeyError
if isinstance(value, value_type) is False:
raise TypeError(value_key)
validate({'a': 12, 'b': 'test'}, [('a', int), ('b', str)])
The tuple could be easily expanded with a 3rd field for validator method which could return a bool.
If the validator method is not None run it. If the returned value is not True raise ValueError(value_key).
Watch out for generic classes in `typing` module as they don't evaluate properly with isinstance().
Agree with others. User data should always be verified before any DB transactions happen. We are using decorator with simple https://pypi.org/project/jsonschema/ for aiohttp request validation
# schemas.py
patch_schema = {
'type': 'object',
'properties': {
'data': {'type': 'object'},
'name': {'type': 'string'},
'thumbnails': {'type': 'array', 'items': {'type': 'string'}}
},
'required': ['data', 'name']
}
# validate.py
import ujson
from aiohttp import web
from jsonschema.exceptions import ValidationError
from jsonschema.validators import validate
def validate_dict(data: dict, schema: dict):
try:
validate(data, schema)
except ValidationError as validation_error:
return web.json_response({'error': f'Invalid json: {validation_error.message}'}, status=400)
except ValueError:
return web.json_response({'error': 'Invalid json'}, status=400)
def validate_json(schema):
def wrapper(func):
async def func_wrapper(*args):
request = args[0].request
body = await request.json(loads=ujson.loads)
validation_result = validate_dict(body, schema)
if validation_result is not None:
return validation_result
return await func(*args)
return func_wrapper
return wrapper
# views.py
from schemas import patch_schema
from validate import validate_json
# and here goes actual view with decorator
@validate_json(schema=patch_schema)
async def smth(self):
pass
pls also try marshmallow-sqlalchemy
class BookSchema(TableSchema):
class Meta:
table = Book.__table__
It works fine for me with gino, but I only tested with plain models (without releationship)