cerberus
cerberus copied to clipboard
Proposal: Support for dataclasses (PEP-557)
Use-case abstract
Validate instances of dataclasses with Cerberus.
Proposal
@rhettinger concluded his talk at PyCon about dataclasses with the outlook that third-party-libraries would take the task to validate these. inevitably i thought about enabling Cerberus to do so.
hence i propose to add a module cerberus.dataclasses
that would provide the following features:
- a
Validator
class that extendscerberus.validator.Validator
- takes dataclass instances as input beside mappings
- derives a schema from:
- a field's type annotation (*)
- ~~a field's
default
ordefault_factory
setting~~ implemented by a dataclass - rules that are provided as a field's
metadata['cerberus']
value
- validates against a mapping representation of a dataclass against that
- fields with dataclass instances as value are handled accordingly (nesting)
- a
validate
function- follows the paradigm of the stdlib's
dataclasses
module - a customized validator can be provided as keyword argument
- it should be capable to return the used validator for error inspection (, when a keyword argument is given or by default?)
- follows the paradigm of the stdlib's
- as i assume there'll be no way to not support normalization, the functions
normalized
andvalidated
as well
*: as type information are provided as annotations in a dataclass' metadata, Cerberus' lack of capability to validate against such constraints is a showstopper for this feature.
while this feature could be postponed to a post-2.0
-release, the outlook on this feature is rather increasing my drive to get started with the 2.x
branch at all.
reminder: there's a backport for Python 3.6.
Let's do this.
@nicolaiarocci @funkyfuture Has this been started yet? If not, I would love to help move this forward.
Does anyone have strong preferences as to how this should be handled internally? I was looking at the dataclasses module and there is an asdict method that converts a dataclass instance into a dictionary. My initial thoughts for the first run-through are:
- Accept a dataclass instance and an optional schema into the DCValidator subclass
- Convert the dataclass into a dict (using asdict)
- Generate a schema based on the dataclass type hints
- Combine the user-provided with the automatically generated one
- Use that merged schema as the true schema for validations
I will also add an appropriate TypeDefinition for the dataclass as well.
i think it's quiet obvious to approach this as you describe.
Accept […] an optional schema
why would you put the rules in an additional data structure when dataclasses have a way to annotate fields? to quote myself:
rules that are provided as a field's metadata['cerberus'] value
however, this is planned to be included after some refactoring for Cerberus 2.0. see the roadmap and some initial progress.
@nicolaiarocci @funkyfuture , @Flargebla Hi, since I was thinking about this combination (cerberus/dataclasses) as well, I stumbled upon this proposal and made a first approach that:
- is a Validator class that extends cerberus.validator.Validator
- takes dataclass instances as input beside mappings
- derives a schema from:
- a field's type annotation (*)
- a field's default or default_factory setting
- rules that are provided as a field's metadata['cerberus'] value
- Convert the dataclass into a dict (using asdict)
- Generate a schema based on the dataclass type hints
- Combine the user-provided with the automatically generated one
- Use that merged schema as the true schema for validations
I made a gist here: https://gist.github.com/pythononwheels/0568496595d0f79caafb1b4466c0e821
I added a validate_test()
method that will heavily print info to see the generated schema and value dicts, as well as the given TestData class field definitions. The normal validate()
method returns just the result calling super().validate(values, schema)
I think a decorator for the dataclass that already injects a schema would be a nice option.
@cerberus_dataclass()
So you can use the validator more easily..
TestDataclass and output here: (for saving one click ;).
This is the Test Dataclass
@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
class TestData():
"""
Test python dataclass for this module.
Representing the needed fields for cerberus schemas and validation.
See: http://docs.python-cerberus.org/en/stable/validation-rules.html
"""
active: bool
bindat: bytes = b''
bin_arr: bytearray = field( default_factory = bytearray )
tdate: datetime.date = field( default_factory = set_date)
tstamp: datetime.datetime = field( default_factory = set_datetime)
props: dict = field( default_factory=dict)
factor: float = field( default_factory=set_random)
votes: int = 0
tags: list = field(default_factory=list)
#number ??
unique_tags: set = field( default_factory=set )
connectionOptions: Dict[str, str] = field( default_factory=dict)
title: str = field (default="",
metadata={"cerberus" :
{
"maxlength" : 30
}
})
This is the output of the validate_test()
method, which intentionally proints a lot of info to stdout.
----------------------------------------------------------------------
| dataclass fields
----------------------------------------------------------------------
Field name: active, type is: <class 'bool'>
Field name: bindat, type is: <class 'bytes'>
Field name: bin_arr, type is: <class 'bytearray'>
Field name: tdate, type is: <class 'datetime.date'>
Field name: tstamp, type is: <class 'datetime.datetime'>
Field name: props, type is: <class 'dict'>
Field name: factor, type is: <class 'float'>
Field name: votes, type is: <class 'int'>
Field name: tags, type is: <class 'list'>
Field name: unique_tags, type is: <class 'set'>
Field name: connectionOptions, type is: typing.Dict[str, str]
Field name: title, type is: <class 'str'>
... cerberus schema metadata: {'maxlength': 30}
----------------------------------------------------------------------
| cerberus schema
----------------------------------------------------------------------
"active" {'type': 'boolean'}
"bindat" {'type': 'binary'}
"bin_arr" {'type': 'binary'}
"tdate" {'type': 'date'}
"tstamp" {'type': 'datetime'}
"props" {'type': 'dict'}
"factor" {'type': 'float'}
"votes" {'type': 'integer'}
"tags" {'type': 'list'}
"unique_tags" {'type': 'set'}
"connectionOptions" {'type': 'dict'}
"title" {'type': 'string', 'maxlength': 30}
----------------------------------------------------------------------
| current values of dataclass instance
----------------------------------------------------------------------
{ 'active': True,
'bin_arr': bytearray(b''),
'bindat': b'',
'connectionOptions': {},
'factor': 1.4276983175256361,
'props': {},
'tags': [],
'tdate': datetime.date(2018, 11, 14),
'title': '',
'tstamp': datetime.datetime(2018, 11, 14, 20, 25, 24, 53223),
'unique_tags': set(),
'votes': 0}
----------------------------------------------------------------------
| cerberus validation result
----------------------------------------------------------------------
True
----------------------------------------------------------------------
| changed title to lenght 32 => validation result
----------------------------------------------------------------------
False
{'title': ['max length is 30']}
Just a first approach ... glad for any replies. Regards, klaas
thank you very much. this is a good impulse to figure out design details. i want to point that i only have brief practical expierence with the dataclasses
module. some thoughts about your gist:
-
i see that the docs hardly the introspective properties of dataclasses. the
__dataclass_fields__
property should be used to as sole source to derive from. -
i'd really prefer to use a field's
type
property as constraint for thetype
rule in the derived schema. that way the code is short, abstract and it can handle any input. -
what about
typing.Union
andtyping.Optional
? -
at least these rules should be dropped if present in a provided rules set and a critical warning be emitted:
required
,default
,default_setter
. oh, that should be implemented with a proper schema that validates that a validator is about to use. i'm too tited for details, that stuff lives incerberus.schema
. it's kinda messy to get into it; that refactor for Python 3 is more convenient to read to get a general idea. maybe the dataclass validator could specify a schema for these rules that unconditionally raises an exception in acheck_with
handler as rules can't be removed with a subclass so far (with a stacked metaclass?!)
have a good night.
closing this issue as there are currently no intentions to continue a next major release.